2,951 Matching Annotations
  1. Last 7 days
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors develop a self-returning self-avoiding polymer model of chromosome organization and show that their framework can recapitulate at the same time local density and large-scale contact structural properties observed experimentally by various technologies. The presented theoretical framework and the results are valuable for the community of modelers working on 3D genomics. The work provides solid evidence that such a framework can be used, is reliable in describing chromatin organization at multiple scales, and could represent an interesting alternative to standard molecular dynamics simulations of chromatin polymer models.

      We appreciate the editor for an accurate description of the scope of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Carignano et al propose an extension of the self-returning random walk (SRRW) model for chromatin to include excluded volume aspects and use it to investigate generic local and global properties of the chromosome 3D organization inside eukaryotic nuclei. In particular, they focus on chromatin volumic density, contact probability, and domain size and suggest that their framework can recapitulate several experimental observations and predict the effect of some perturbations.

      We thanks the reviewer for the attention paid to the manuscript and all the relevant comments.

      Strengths:

      - The developed methodology is convincing and may offer an alternative - less computationally demanding - framework to investigate the single-cell and population structural properties of 3D genome organization at multiple scales.

      - Compared to the previous SRRW model, it allows for investigation of the role of excluded volume locally.

      Excluded volume is accounted for everywhere, not locally. We emphasized this on page 3, line 182:

      “The method that we employ to remove overlaps is a low-temperature-controlled molecular dynamics simulation using a soft repulsive interaction potential between initially overlapping beads, that is terminated as soon as all overlaps have been resolved, as described in the Appendix 3.”


      - They perform some experiments to compare with model predictions and show consistency between the two.

      Weaknesses:

      - The model is a homopolymer model and currently cannot fully account for specific mechanisms that may shape the heterogeneous, complex organization of chromosomes (TAD at specific positions, A/B compartmentalization, promoter-enhancer loops, etc.).

      The SR-EV model is definitely not a homo-polymer, as it is not a regular concatenation of a single monomeric unit.

      The model includes loops, which may happen in two ways: 1) As in the SRRW, branching structures emerging from the configuration backbone can be interpreted as nested loops and 2) A relatively long forward step followed by a return is a single loop. The model induces the formation of packing domains, which are not TADs, and are quantitatively in agreement with ChromSTEM experiments.

      We consider convenient to add a new figure that will further clarify the structures obtained with the SR-EV model. The following paragraph and figure has been added in page 5:

      “The density heterogeneity displayed by the SR-EV configurations can be analyzed in terms of the accessibility. One way to reveal this accessibility is by calculating the coordinations number (CN) for each nucleosome, using a coordination radius of 11.5 nm, along the SR-EV configuration. CN values range from 0 for an isolated nucleosome to 12 for a nucleosome immersed in a packing domain. In Figure 3 we show the SR-EV configuration showed in Figure 2, but colored according to CN. CN can be also considered as a measure to discriminate heterochromatin (red) and euchromatin (blue). Figure 3-A shows how the density inhomogeneity is coupled to different CN, with high CN represented in red and low CN represented in blue. Figure 3-B show a 50 nm thick slab obtained from the same configuration that clearly show the nucleosomes at the center of each packing domains are almost completely inaccesible, while those outside are open and accessible. It is also clear that the surface of the packing domains are characterized by nearly white nucleosomes, i.e. coordinated towards the center of the domain and open in the opposite direction.”

      - By construction of their framework, the effect of excluded volume is only local and larger-scale properties for which excluded volume could be a main actor (formation of chromosome territories [Rosa & Everaers, PLoS CB 2009], bottle-brush effects due to loop extrusion [Polovnikov et al, PRX 2023], etc.) cannot be captured.

      Excluded volume is considered for all nucleosomes, including overlapping beads distant along the polymer chain. Chromosome territories can be treated, but it is not in this case because we look at a single model chromosome.

      - Apart from being a computationally interesting approach to generating realistic 3D chromosome organization, the method offers fewer possibilities than standard polymer models (eg, MD simulations) of chromatin (no dynamics, no specific mechanisms, etc.) with likely the same predictive power under the same hypotheses. In particular, authors often claim the superiority of their approach to describing the local chromatin compaction compared to previous polymer models without showing it or citing any relevant references that would show it.

      We apologize if the text transmit an idea of superiority over other methods that was not intended. SR-EV is an alternative tool that may give a different, even complementary point of view, to standard polymer models.

      - Comparisons with experiments are solid but are not quantified.

      The comparisons that we have presented are quantitative. We do not have so far a way to characterize alpha or phi, a priori, for a particular system.

      Impact:

      Building on the presented framework in the future to incorporate TAD and compartments may offer an interesting model to study the single-cell heterogeneity of chromatin organization. But currently, in this reviewer's opinion, standard polymer modeling frameworks may offer more possibilities.

      We thank the reviewer for the positive opinion on the potential of the presented method. The incorporation of TADs and compartments is left for a future evolution of the model as its complexity will make this work extremely long.

      Reviewer #2 (Public Review):

      Summary:

      The authors introduce a simple Self Returning Excluded Volume (SR-EV) model to investigate the 3D organization of chromatin. This is a random walk with a probability to self-return accounting for the excluded volume effects. The authors use this method to study the statistical properties of chromatin organization in 3D. They compute contact probabilities, 3D distances, and packing properties of chromatin and compare them with a set of experimental data.

      We thank the reviewer for the attention paid to our manuscript.

      Strengths:

      (1) Typically, to generate a polymer with excluded volume interactions, one needs to run long simulations with computationally expensive repulsive potentials like the WeeksChanlder-Anderson potential. However, here, instead of performing long simulations, the authors have devised a method where they can grow polymer, enabling quick generation of configurations.

      (2) Authors show that the chromatin configurations generated from their models do satisfy many of the experimentally known statistical properties of chromatin. Contact probability scalings and packing properties are comparable with Chromatin Scanning Transmission Electron Microscopy (ChromSTEM)  experimental data from some of the cell types.

      Weaknesses:

      This can only generate broad statistical distributions. This method cannot generate sequence-dependent effects, specific TAD structures, or compartments without a prior model for the folding parameter alpha. It cannot generate a 3D distance between specific sets of genes. This is an interesting soft-matter physics study. However, the output is only as good as the alpha value one provides as input.

      We proposed a model to create realistic chromatin configuration that we have contrasted with specific single cell experiments, and also reproducing ensemble average properties. 3D distances between genes can be calculated after mapping the genome to the SR-EV configuration. The future incorporation of the genome sequence will also allow us to describe TADs and A/B compartments. See added paragraph in the Discussion section:

      “The incorporation of genomic character to the SR-EV model will allow us to study all individual single chromosomes properties, and also topological associated domains and A/B compartmentalization from ensemble of configurations as in HiC experiments. “

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major:

      - In the introduction and along the text, the authors are often making strong criticisms of previous works (mostly polymer simulation-based) to emphasize the need for an alternative approach or to emphasize the outcomes of their model. Most of these statements (see below) are incomplete if not wrong. I would suggest tuning down or completely removing them unless they are explicitly demonstrated (eg, by explicit quantitative comparisons). There is no need to claim any - fake - superiority over other approaches to demonstrate the usefulness of an approach. Complementarity or redundance in the approaches could also be beneficial.

      We regret if we unintentionally transmitted a claim of superiority. We have made several small edits to change that.

      - Line 42-43: at least there exist many works towards that direction (including polymer modeling, but also statistical modeling). For eg, see the recent review of Franck Alber.

      Line removed. Citation to Franck Alber included below in the text.

      - Line 54-57: Point 1 is correct but is it a fair limitation? These models can predict TADs & compartments while SR-EV no. Point 2 is wrong, it depends on the resolution of the model and computer capacity but it is not an intrinsic limitation. Point 3 is wrong, such models can predict very well single-cell properties, and again it is not an intrinsic limitation of the model. Point 4 is incorrect. The space-filling/fractal organization was an (unfortunate) picture to emphasize the typical organization of chromosomes in the early times (2009), but crumpled polymers which are a more realistic description are not space-filling (see Halverson et al, 2013).

      Text involving points 1 to 4 removed. It was unnecessary and does not change the line of the paper.

      - L400-402 + 409-411: in such a model, the biphasic structure may emerge from loop extrusion but also naturally from the crumpled polymer organization. Simple crumpled polymer without loop extrusion and phase separation would also produce biphasic structures.

      Yes, we agree. Also SR-EV leads to biphasic structures.

      - L 448-449: any data to show that existing polymer modeling would predict a strong dependency of C_p(n) on the volumic fraction (in the range studied here)?

      No, I don’t know a work predicting that.

      - Fig. 4:

      - Large-scale structural properties (R^2(n) and C_p(n)) are not dependent on phi. Is it surprising that by construction, SR-EV only relaxes the system locally after SRRW application?

      Excluded volume is considered at all length scales. However, as the decreasing C_p curves observed in theories and experiments imply, the fraction of overlap (or contacts) is more important at small separations (local) than at large separations. Yet, it was a surprise for us to observed negligible effect on phi.

      - Why not make a quantitative comparison between predicted and measured C_p(n)? Or at least plotting them on the same panel.

      Panels B and C are in the same scale and show a good agreement between SR-EV and experiments. However, it is not perfectly quantitative agreement. SR-EV represents the generic structure of chromatin and perfect agreement should not be expected.

      - Comparison with an average C_p(n) over all the chromosomes would be better.

      Possibly, but we don’t think it adds anything to the paper.

      - In Figure 5,6,7 (and related text): authors often describe some parameter values that are 'closest to experiment findings'. Can the authors quantify/justify this? The various 'closest' parameters are different. Can the authors comment?

      The folding parameter and average volume fraction are chose so that the agreement is best with the displayed experimental system, different cell for each case.

      - Figure 5: why not show the experimental distribution from Ou et al?

      - Figure 6 & 7: experimental results. Can the authors show images from their own experiments? Can they show that cohesion/RAD21 is really depleted after auxin treatment?

      It is currently under review in a different journal.

      - In the Discussion, a fair discussion on the limitations of the methods (dynamics, etc) is missing.

      Minor

      - Line 34-36: the logical relationship between this sentence and the ones before and after is very unclear.

      - Along the text, authors use the term 'connectivity' to describe 3D (Hi-C) contacts between different regions of the same chromosome/polymer. This is misleading as connectivity in polymer physics describes the connection along the polymer and not in the 3D space.

      No. I don’t think we used connectivity in that sense. We agree with your statement on the use of connectivity in polymer physics, and is what we always had in mind for this model.

      - Line 92: typo.

      - On the SR-EV method: does the relaxation process create local knots in the structure?

      We have not checked for knots.

      - Table 1: the good correspondence with linker length is remarkable but likely 'fortunate', other chosen resolutions would have led to other results. Moreover, the model cannot account for the fine structure of chromatin fiber. Can the authors comment on that?

      Fortunate to the extent that we sample the model parameter to overall catch the structure of chromatin.

      - Line 211: 'without the need of imposing any parameter': alpha is a parameter, no?

      Correct. Phrase deleted.

      - L267-269 & 450-451: actually in Liu & Dekker, they do observe an effect on Hi-C map (C_p(n)), weak but significant and not negligible.

      Our statements read ‘minimal’ and ‘relatively insensitive’. It is observed, but very small.

      - L283-286: This is a perspective statement that should be in the discussion.

      Moved to the Discussion, as suggested.

      - L239-241: The authors seem to emphasize some contradictions with recent results on phase separation. This is unclear and should be relocated to discussion.

      We just pointed out recent experiments, as stated. No intention to generate a discussion with any of them.

      - L311-313: Unclear statement.

      - L316-325: This is not results but discussion/speculation.

      Moved to Discussion

      - Along the text: 'promotor'-> 'promoter'. 

      - Corrected.

      - L364: explain more in detail PWS microscopy.

      Reviewer #2 (Recommendations For The Authors):

      Even though there are claims about nucleosome-resolution chromatin polymer, it is not clear that this work can generate structures with known nucleosome-resolution features. Nucleosome-level structure is much beyond a random walk with excluded volume and is driven by specific interactions. The authors should clarify this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Yang, Hu et al. examined the molecular mechanisms underlying astrocyte activation and its implications for multiple sclerosis. This study shows that the glycolytic enzyme PKM2 relocates to astrocyte nuclei upon activation in EAE mice. Inhibiting PKM2's nuclear import reduces astrocyte activation, as evidenced by decreased proliferation, glycolysis, and inflammatory cytokine release. Crucially, the study identifies TRIM21 as pivotal in regulating PKM2 nuclear import via ubiquitination. TRIM21 interacts with PKM2, promoting its nuclear translocation and enhancing its activity, affecting multiple signaling pathways. Confirmatory analyses using single-cell RNA sequencing and immunofluorescence demonstrate TRIM21 upregulation in EAE astrocytes. Modulating TRIM21 expression in primary astrocytes impacts PKM2-dependent glycolysis and proliferation. In vivo experiments targeting this mechanism effectively mitigate disease severity, CNS inflammation, and demyelination in EAE.

      The authors supported their claims with various experimental approaches, however, some results should be supported with higher-quality images clearly depicting the conclusions and additional quantitative analyses of Western blots.

      Thanks for the reviewer’s comments. We agree with the reviewer and have added higher magnification images, for example Fig.2A to better visualize the localization of PKM2 in DASA-treated conditions, and Fig. 3A and Fig.3B to better visualize the pSTAT3 and pp65. Moreover, we have added quantitative analyses of Western blots for some key experiments, for example quantitative results for Fig.2D is added in Fig.S3 to show the change of PKM2 and p-c-myc in DASA-58-treated conditions and quantitative results for Fig. 3D are added in Fig.S4B and S4C to show the change of nuclear and cytoplasmic PKM2, STAT3 and NF-κB in different conditions.

      Strength:

      This study presents a comprehensive investigation into the function and molecular mechanism of metabolic reprogramming in the activation of astrocytes, a critical aspect of various neurological diseases, especially multiple sclerosis. The study uses the EAE mouse model, which closely resembles MS. This makes the results relevant and potentially translational. The research clarifies how TRIM21 regulates the nuclear import of PKM2 through ubiquitination by integrating advanced techniques. Targeting this axis may have therapeutic benefits since lentiviral vector-mediated knockdown of TRIM21 in vivo significantly reduces disease severity, CNS inflammation, and demyelination in EAE animals.

      We thank the reviewer for their positive and constructive comments on the manuscript.

      Weaknesses:

      The authors reported that PKM2 levels are elevated in the nucleus of astrocytes at different EAE phases compared to cytoplasmic localization. However, Figure 1 also shows elevated cytoplasmic expression of PKM2. The authors should clarify the nuclear localization of PKM2 by providing zoomed-in images. An explanation for the increased cytoplasmic PKM2 expression should provided. Similarly, while PKM2 translocation is inhibited by DASA-58, in addition to its nuclear localization, a decrease in the cytoplasmic localization of PKM2 is also observed. This situation brings to mind the possibility of a degradation mechanism being involved when its nuclear translocation of PKM2 is inhibited.

      According to the results of immunofluorescence staining of PKM2 in spinal cord of EAE mice and in cultured primary astrocytes, in addition to the observation of PKM2 nuclear translocation in EAE conditions, we showed an elevated expression of PKM2 in astrocytes, including the cytoplasmic and nuclear expression. In neurological diseases, various studies showed consistent results, for example, following spinal cord injury (SCI), not only the upregulated expressing of PKM2 but also nuclear translocation was observed in astrocytes (Zhang et al., 2015). In EAE conditions, CNS inflammation is elevated and several proinflammatory cytokines and chemokines might contribute to the upregulated expression of PKM2 in astrocytes. We have tested TNFα and IL-1β, which are recognized to play important roles in EAE and MS (Lin and Edelson, 2017, Wheeler et al., 2020), and results from western blots showed the increased expression of PKM2 upon stimulation with TNFα and IL-1β (Author response image 1). Moreover, according to the reviewer’s suggestions, we have added zoomed-in images for figure 2A.

      Additionally, the reviewer has noted the decrease in the cytoplasmic PKM2 level, degradation-related mechanism and other mechanisms might be involved in this process.

      Author response image 1.

      Upregulated expression of PKM2 in astrocytes following stimulation with TNF-α and IL-1β. Primary astrocytes were stimulated with TNF-α and IL-1β (50 ng/mL) for 48 h and western blotting analysis were performed.

      In Figure 3D, the authors claim that PKM2 expression causes nuclear retention of STAT3, p65, and p50, and inhibiting PKM2 localization with DASA-58 suppresses this retention. The western blot results for the MOG-stimulated group show high levels of STAT3, p50, and p65 in nuclear localization. However, in the MOG and DASA-58 treated group, one would expect high levels of p50, p65, and STAT3 proteins in the cytoplasm, while their levels decrease in the nucleus. These western blot results could be expanded. Additionally, intensity quantification for these results would be beneficial to see the statistical difference in their expressions, especially to observe the nuclear localization of PKM2.

      We agree with the reviewer’s comments and we have incorporated the quantification of STAT3,p50 and p65 for Fig.3D and Fig.S4B and Fig.S4C. Nevertheless, given that DASA-58 did not trigger a notable increase in the cytoplasmic level of PKM2, we did not detect an upregulation of STAT3, p50, or p65 in the cytoplasm of the MOG and DASA-58-treated groups. With the quantification results, it is more obvious to see the changes of these proteins in different conditions.

      The discrepancy between Figure 7A and its explaining text is confusing. The expectation from the knocking down of TRIM21 is the amelioration of activated astrocytes, leading to a decrease in inflammation and the disease state. The presented results support these expectations, while the images showing demyelination in EAE animals are not highly supportive. Clearly labeling demyelinated areas would enhance readers' understanding of the important impact of TRIM21 knockdown on reducing the disease severity.

      Thank you for pointing this out. We sincerely apologize for our carelessness. Based on your comments, we have made the corrections in the manuscript. As there is indeed a statistical difference in the mean clinical scores between shTRIM21-treated group and shVec group, we have accordingly revised the sentence for Figure 7A to state, “At the end time point at day 22 p.i., shTRIM21-treated group showed reduced disease scores compared to control groups (Fig. 7A).” .

      Additionally, we have added the whole image of the spinal cord for MBP in Author Response image 2. Moreover, we have labelled the demyelinated areas to facilitate readers’ understanding.

      Author response image 2.

      MBP staining of the whole spinal cord in EAE mice from shVec and shTRIM21 group. Scale bar: 100 μm. Demyelinated areas are marked with dashed lines.

      Reviewer #2 (Public Review):

      This study significantly advances our understanding of the metabolic reprogramming underlying astrocyte activation in neurological diseases such as multiple sclerosis. By employing an experimental autoimmune encephalomyelitis (EAE) mouse model, the authors discovered a notable nuclear translocation of PKM2, a key enzyme in glycolysis, within astrocytes.

      Preventing this nuclear import via DASA 58 substantially attenuated primary astrocyte activation, characterized by reduced proliferation, glycolysis, and inflammatory cytokine secretion.<br /> Moreover, the authors uncovered a novel regulatory mechanism involving the ubiquitin ligase TRIM21, which mediates PKM2 nuclear import. TRIM21 interaction with PKM2 facilitated its nuclear translocation, enhancing its activity in phosphorylating STAT3, NFκB, and c-myc. Single-cell RNA sequencing and immunofluorescence staining further supported the upregulation of TRIM21 expression in astrocytes during EAE.

      Manipulating this pathway, either through TRIM21 overexpression in primary astrocytes or knockdown of TRIM21 in vivo, had profound effects on disease severity, CNS inflammation, and demyelination in EAE mice. This comprehensive study provides invaluable insights into the pathological role of nuclear PKM2 and the ubiquitination-mediated regulatory mechanism driving astrocyte activation.

      The author's use of diverse techniques, including single-cell RNA sequencing, immunofluorescence staining, and lentiviral vector knockdown, underscores the robustness of their findings and interpretations. Ultimately, targeting this PKM2-TRIM21 axis emerges as a promising therapeutic strategy for neurological diseases involving astrocyte dysfunction.

      While the strengths of this piece of work are undeniable, some concerns could be addressed to refine its impact and clarity further; as outlined in the recommendations for the authors.

      Thanks for the reviewer’s comment and positive evaluation of our present work. We have further answered each question in recommendations section.

      Reviewer #3 (Public Review):

      Summary:

      Pyruvate kinase M2 (PKM2) is a rate-limiting enzyme in glycolysis and its translocation to the nucleus in astrocytes in various nervous system pathologies has been associated with a metabolic switch to glycolysis which is a sign of reactive astrogliosis. The authors investigated whether this occurs in experimental autoimmune encephalomyelitis (EAA), an animal model of multiple sclerosis (MS). They show that in EAA, PKM2 is ubiquitinated by TRIM21 and transferred to the nucleus in astrocytes. Inhibition of TRIM21-PKM2 axis efficiently blocks reactive gliosis and partially alleviates symptoms of EAA. Authors conclude that this axis can be a potential new therapeutic target in the treatment of MS.

      Strengths:

      The study is well-designed, controls are appropriate and a comprehensive battery of experiments has been successfully performed. Results of in vitro assays, single-cell RNA sequencing, immunoprecipitation, RNA interference, molecular docking, and in vivo modeling etc. complement and support each other.

      Weaknesses:

      Though EAA is a valid model of MS, a proposed new therapeutic strategy based on this study needs to have support from human studies.

      We agree that although we have clarified the therapeutic potential of targeting TRIM21 or PKM2 in the treatment of EAE, a mouse model of MS, the application in human studies warrants further studies. While considering the use of TRIM21 as a target for treating multiple sclerosis in clinical trials, several issues need to be addressed to ensure the safety, efficacy and feasibility. One such aspect is the development of drug that specifically target TRIM21 in brain, capable of crossing the blood-brain barrier and have minimal off-target effects. The translation of preclinical finding into clinical trials poses a significant challenge. To provide evidence for the similarities between the EAE model and multiple sclerosis, we have screened GEO databases (Author response image 3). In GSE214334 which analyzed transcriptional profiles of normal-appearing white matter from non-MS and different subtypes of disease (RRMS, SPMS and PPMS). Although no statistical difference was observed among different groups, the TRIM21 expression has tendency to increase in SPMS (secondary progressive MS) and PPMS (primary progressive MS) patients. In GSE83670, astrocytes from 3 control white matter and 4 multiple sclerosis normal appearing white matter (NAWM) were analyzed. TRIM21 mRNA expression is higher in MS group (78.73 ± 10.44) compared to control group (46.67 ± 24.15). Although these two GEO databases did not yield statistically significant differences, TRIM21 expression appears to be elevated in the white matter of MS patients compared to controls.

      To address this limitation, we have incorporated the following statement in the discussion section: “However, whether TRIM21-PKM2 could potentially serve as therapeutic targets in multiple sclerosis warrants further studies.”

      Author response image 3.

      TRIM21 expression in control and MS patients based on published GEO database. (A) The expression of TRIM21 in normal-appearing white matter in non-MS (Ctl) and different clinical subtypes of MS (RRMS, SPMS, PPMS) based on GSE214334 (one-way ANOVA). (B) The expression of TRIM21 from multiple sclerosis normal appearing white matter (NAWM) and control WM based on GSE83670. RRMS, relapsing--remitting MS; SPMS, secondary progressive MS; PPMS, primary progressive MS (unpaired Student's t test). Data are represented as the means ± SEM.

      Reviewer #4 (Public Review):

      Summary:

      The authors report the role of the Pyruvate Kinase M2 (PKM2) enzyme nuclear translocation as fundamental in the activation of astrocytes in a model of autoimmune encephalitis (EAE). They show that astrocytes, activated through culturing in EAE splenocytes medium, increase their nuclear PKM2 with consequent activation of NFkB and STAT3 pathways. Prevention of PKM2 nuclear translocation decreases astrocyte counteracts this activation. The authors found that the E3 ubiquitin ligase TRIM21 interacts with PKM2 and promotes its nuclear translocation. In vivo, either silencing of TRIM21 or inhibition of PKM2 nuclear translocation ameliorates the severity of the disease in the EAE model.

      Strengths:

      This work contributes to the knowledge of the complex action of the PKM2 enzyme in the context of an autoimmune-neurological disease, highlighting its nuclear role and a novel partner, TRIM21, and thus adding a novel rationale for therapeutic targeting.

      Weaknesses:

      Despite the relevance of the work and its goals, some of the conclusions drawn would require more thorough proof:

      I believe that the major weakness is the fact that TRIM21 is known to have per se many roles in autoimmune and immune pathways and some of the effects observed might be due to a PKM2-independent action. Some of the experiments to link the two proteins, besides their interaction, do not completely clarify the issue. On top of that, the in vivo experiments address the role of TRIM21 and the nuclear localisation of PKM2 independently, thus leaving the matter unsolved.

      We agree that TRIM21 has multifunctional roles and only some of their effects are due to PKM2-independent action. It is obvious that TRIM21 functions as ubiquitin ligases and its substrate are various. Here we identify PKM2 as one of its interacting proteins and our focus is the relationship between TRIM21 and the nuclear translocation PKM2, we have used diverse experiments to clarify their relationships, for example immunoprecipitation, western blotting, immunofluorescence, cyto-nuclear protein extraction. These aforementioned experiments are key points of our studies. From the results of in vitro experiments, targeting either TRIM21 or PKM2 might be potential targets for EAE treatment. Expectedly, from in vivo experiments, either targeting TRIM21 or PKM2 nuclear transport ameliorated EAE. In order to test the relationship of TRIM21 and PKM2 nuclear transport in vivo, we have stained PKM2 in shVec and shTRIM21-treated mice. Expectedly, knocking down TRIM21 led to a decrease in the nuclear staining of PKM2 in spinal cord astrocytes in EAE models (Figure S7A). This observation underscores that the therapeutic potential of inhibiting TRIM21 in astrocytes in vivo might be partially due to its role in triggering the reduced nuclear translocation of PKM2.

      Some experimental settings are not described to a level that is necessary to fully understand the data, especially for a non-expert audience: e.g. the EAE model and MOG treatment; action and reference of the different nuclear import inhibitors; use of splenocyte culture medium and the possible effect of non-EAE splenocytes.

      According to the reviewer’s suggestions, we have added more detailed descriptions in the materials and methods section, for example, the use of splenocytes culture medium, mass spectrometry, HE and LFB staining have been added. More details are incorporated in the part for “EAE induction and isolation and culture of primary astrocytes”. Moreover, the reference of DASA-58 in vitro and TEPP-46 in vivo as inhibitors of PKM2 nuclear transport were added.

      The statement that PKM2 is a substrate of TRIM21 ubiquitin ligase activity is an overinterpretation. There is no evidence that this interaction results in ubiquitin modification of PKM2; the ubiquitination experiment is minimal and is not performed in conditions that would allow us to see ubiquitination of PKM2 (e.g. denaturing conditions, reciprocal pull-down, catalytically inactive TRIM21, etc.).

      To prevent the misunderstanding, we have revised certain statements in the manuscript. In the updated version, the description is as follows: Hereby, we recognized PKM2 as an interacting protein of TRIM21, and further studies are required to determine if it is a substrate of E3 ligase TRIM21.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General recommendations:

      - The whole manuscript needs language editing.

      We appreciate the comments of the reviewers. We have improved the writing of the manuscript. All modifications are underlined.

      - Details of many experiments are not given in the materials and methods.

      According to the reviewer’s suggestions, we have added more details for experiments in the materials and methods. For example, “Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes”, “mass spectrometry”, “Hematoxylin-Eosin (HE) and Luxol Fast Blue (LFB) staining” were added in the section of Materials and Methods. More detailed information is given for EAE induction and isolation and culture of primary astrocytes.

      - Line properties in graphics should be corrected, some lines in box plots and error bars are very weak and hardly visible. Statistical tests should be included in figure legends as well. Statistical differences should be mentioned for control vs DASA-58 (alone) in all related figures.

      We have revised the figures to enhance their visibility by thickening the lines and error bars. In accordance with the reviewer’s suggestions, we have incorporated statistical tests in figure legends. Moreover, statistical analysis has been made among all groups, if there is no asterisk indicated in the figure legend and figure panels, it means there is no statistical difference between the control vs DASA-58 groups. For most of the experiments conducted in our studies, including lactate production, glucose consumption, the EdU analysis and CCK8 analysis, the change of STAT3 and NF-κB pathways, no statistical difference was observed between the control and DASA-58 group. The reason might be due to that in unstimulated astrocytes, the expression of PKM2 is low and nuclear translocation of PKM2 are few, which may explain why DASA-58 did not exert the anticipated effect. Thus, in our experiments, we have used MOGsup to stimulate astrocytes, enabling us to observe the impact of DASA-58 on the astrocyte proliferation and glycolysis in this condition.

      - Scale bars, arrows, and labeling in the images are not visible.

      We have improved the images according to the reviewer’s suggestions. The scale bars, arrows are made thicker and labeling are larger. The updated figures are visible.

      - Quantitative analysis of all western blot results and their statistics could be provided in every image and for every protein.

      For western blotting results which are further processed with quantitative analysis, for example, Fig.2D, fig. 5G, Fig. 6A and 6B, Fig. S4, we have added their statistics in the raw data sections. The other western blot results, for example, IP analysis, which are used to analyze protein-protein binding are not further processed with quantitative analysis.

      - Proteins that are used for normalizations in western blots should be stated in the text.

      We have added description of proteins that are used for normalization in western blots in figure legends. Moreover, in figure panels, proteins used for normalization are indicated. Globally, whole protein level is normalized to protein level of β-actin. For nuclear and cytoplasmic proteins, nuclear protein is normalized to the expression of lamin, cytoplasmic protein is normalized to the expression of tubulin. 

      - The manuscript investigates the role of TRIM21 in the nuclear localization of PKM2 in astrocytes in EAE mice, however almost no information is given about TRIM21 in the introduction. Extra information is given for PKM2, yet can be concisely explained.

      We have added a paragraph that describes the information of TRIM21 in the introduction section. The description is as follows: “TRIM21 belongs to the TRIM protein family which possess the E3 ubiquitin ligase activity. In addition to its well-recognized function in antiviral responses, emerging evidences have documented the multifaceted role of TRIM21 in cell cycle regulation, inflammation and metabolism (Chen et al., 2022). Nevertheless, the precise mechanisms underlying the involvement of TRIM21 in CNS diseases remain largely unexplored.”

      - "As such, deciphering glycolysis-dominant metabolic switch in astrocytes is the basis for understanding astrogliosis and the development of neurological diseases such as multiple sclerosis." The sentence could be supported by references.

      To support this sentence, we have added the following references:

      (1) Xiong XY, Tang Y, Yang QW. Metabolic changes favor the activity and heterogeneity of reactive astrocytes. Trends in endocrinology and metabolism: TEM 2022;33(6):390-400.

      (2) das Neves SP, Sousa JC, Magalhães R, Gao F, Coppola G, Mériaux S, et al. Astrocytes Undergo Metabolic Reprogramming in the Multiple Sclerosis Animal Model. Cells 2023;12(20):2484.

      Figure 1/Result 1:

      - Figure 1A-B: Quality of the images should be improved.

      According to the reviewer’s suggestion, we have improved the quality of the image, images with higher resolution were added in figure 1A and figure 1B.

      - Control images of Figure 1B are not satisfying. GFAP staining is very dim. Images from control cells should be renewed.

      As mentioned by the reviewer’s, we have renewed the control images and added the DAPI staining figures for all groups. Compared with MOGsup stimulated astrocytes, the control cells are not in activated state and GFAP are relatively low.

      - Labelings on the images are not sufficient, arrows and scale bars are not visible.

      We have improved the images including labels, arrows and scale bars in all figures.

      - How splenocytes were obtained from MOG induced mice were not given in the material and methods section. Thus, it should be clearly stated how splenocyte supernatant is generated (treatment details).

      We have added the detailed information relating to splenocyte isolation and splenocyte supernatant entitled “Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes” in the section of Materials and methods. “Splenocytes were isolated from EAE mice 15 d (disease onset) after MOG35-55 immunization. Briefly, spleen cells were suspended in RPMI-1640 medium containing 10% FBS. Splenocytes were plated in 12-well plates at 1x106 cells/well containing 50 μg/mL MOG35-55 and cultured at 37°C in 5% CO2. After stimulation for 60 h, cell suspension was centrifuged at 3000 rpm for 5 min and supernatants were collected. For the culture of MOGsup-stimulated astrocytes, astrocytes were grown in medium containing 70% DMEM supplemented with 10% FBS and 30% supernatant from MOG35-55-stimulated-splenocytes.”

      - For general astrocyte morphology: authors showed the cells are GFAP+ astrocytes. It is surprising that these cells do not bear classical astrocyte morphology in cell culture. How long do you culture astrocytes before treatment? How do you explain their morphological difference?

      Astrocytes were cultured for 2 to 3 weeks which correspond to 2-3 passages before treatment. There are several possible reasons for the morphological differences observed between GFAP+ astrocytes and their classical morphology. Firstly, the cell density. In low-density culture just as shown in Figure 1B, we have observed that astrocytes adopt a more flattened morphology. In high-density cultures, they adopt a stellate shape. Moreover, variations in culture conditions, such as the use of different fetal bovine serum, can also influence the morphology of astrocytes. In addition, the mechanical injury induced by the isolation procedures for astrocytes might contribute to variations in their morphology during in vitro cultivation. In summary, the morphological differences observed in GFAP+ astrocytes in cell culture likely result from a combination of culture conditions, cell density, and mechanical injury occured during astrocyte isolation etc.

      - Additional verification of reactive astrocytes could be performed by different reactive astrocyte markers, such as GLAST, Sox9, S100ß. Thus, quantitative analysis of activated astrocytes can be done by counting DAPI vs GLAST, Sox9 or S100ß positive cells.

      We really agree with the reviewer that there are other markers of reactive astrocytes such as GLAST, sox9 and S100β. However, numerous evidences support that GFAP is the most commonly used reactive astrocyte markers. Most of the cases, reactive astrocytes undergo GFAP overexpression. GFAP is one the most consistently induced gene in transcriptomic datasets of reactive astrocytes, confirming its usefulness as a reactive marker (Escartin et al., 2019). Thus, we have used GFAP as the marker of astrocyte activation in our study.

      - How you performed quantifications for Figures 1C and 1D should be clearly explained, details are not given.

      Quantification for Figure 1C and 1D were added in the figure legend. In general, Mean fluorescence intensity of PKM2 in different groups of (B) was calculated by ImageJ. The number of nuclear PKM2 was quantified by Image-Pro Plus software manually (eg. nuclear or cytoplasmic based on DAPI blue staining). The proportion of nuclear P KM2 is determined by normalizing the count of nuclear PKM2 to the count of nuclear DAPI, which represents the number of cell nuclei.

      - "Together, these data demonstrated the nuclear translocation of PKM2 in astrocytes from EAE mice." Here the usage of "suggests" instead of "demonstrated".

      Based on the reviewer's suggestion, we have revised the use of "demonstrated" to "suggest" in this sentence.

      Result 2 and 3:

      - In the literature, DASA-58 is shown to be the activator of PKM2 (https://www.nature.com/articles/nchembio.1060https://doi.org/10.1016/j.cmet.2019.10.015).

      - Providing references for the inhibitory use of DASA-58 for PKM2 would be appreciated.

      DASA-58 is referred to as “PKM2 activator” due to its ability to enforce the tetramerization of PKM2, enhancing the enzymatic ability of PKM2 to catalyze PEP to pyruvate conversion. However, the enforced conversion of tetramerization of PKM2 inhibited the dimer form of PKM2, thereby inhibiting its nuclear translocation. For this reason, DASA-58 is also used as the inhibitor of nuclear translocation of PKM2. In primary BMDMs, LPS induced nuclear PKM2. However, driving PKM2 into tetramers using DASA-58 and TEPP-46 inhibited LPS-induced PKM2 nuclear translocation (Palsson-McDermott et al., 2015). Consistently, FSTL1 induced PKM2 nuclear translocation was inhibited by DASA-58 in BMDMs (Rao et al., 2022). Accordingly, we have added these references in the manuscript.

      - Western blot results and statistics for PKM2 should be quantitatively given for all groups.

      According to the reviewer’s suggestions, we have added the quantification of PKM2 for western blots in figure 2 and figure 3. Quantification of PKM2 in figure 2D is added in Fig S3. Quantification of PKM2 in figure 3D is added in Fig.S4B and Fig. S4C.

      - Figure 3A-B: staining method/details are not mentioned in materials and methods.

      Staining methods is in the paragraph entitled “Immunofluorescence” in the section of materials and methods. The descriptions are as follows:

      For cell immunochemistry, cells cultured on glass coverslips were fixed with 4% PFA for 10 min at RT, followed by permeabilization with 0.3% Triton X-100. Non-specific binding was blocked with buffer containing 3% BSA for 30 min at RT. Briefly, samples were then incubated with primary antibodies and secondary antibodies. DAPI was used to stain the nuclei. Tissues and cells were observed and images were acquired using an EVOS FL Auto 2 Cell image system (Invitrogen). The fluorescence intensity was measured by ImageJ.

      - In Figure 3A, in only DASA-58 treated cells, it looks like GFAP staining is decreased. It would be better to include MFI analysis for GFAP in the supplementary information.

      We have added the MFI analysis for GFAP in Figure 3A in Fig.S4A. GFAP expression is decreased after DASA-58 treatment (in both control and MOGsup condition), the reason might be due to the effect of DASA-58 on inhibition of PKM2 nuclear transport, which subsequently suppress the activation of astrocytes, leading to the decreased expression of GFAP.

      Result 4

      - Detailed explanation of the mass spectrometry and IP experiments should be given in materials and methods. What are the conditions of the cells? Which groups were analyzed? Are they only MOG stimulated, MOG-DASA-58 treated, or only primary astrocytes without any treatment? The results should be interpreted according to the experimental group that has been analyzed.

      We have added the detailed information relating to mass spectrometry and immunoprecipitation in the materials and methods. In general, two groups of cells were subjected to mass spectrometry analysis, primary astrocytes without any treatment and MOGsup-stimulated primary astrocytes. These two groups were immunoprecipitated with anti-PKM2 antibody. Moreover, in the manuscript, we have revised the sentence concerning the description of mass spectrometry. The description is as follows: “To illustrate underlying mechanism accounting for nuclear translocation of PKM2 in astrocytes, we sought to identify PKM2-interacting proteins. Here, unstimulated and MOGsup-stimulated primary astrocytes were subjected to PKM2 immunoprecipitation, followed by mass spectrometry”. Furthermore, the description of these two groups of cells were added in the figure legend of Fig.4.

      Result 5:

      - For the reader, it would be better to start this part by explaining the role of TRIM21 in cells by referring to the literature.

      We agreed with the reviewer that beginning this part by explaining the role of TRIM21 would be better. Accordingly, we have added the following descriptions at the beginning of this part: “TRIM21 is a multifunctional E3 ubiquitin ligase that plays a crucial role in orchestrating diverse biological processes, including cell proliferation, antiviral responses, cell metabolism and inflammatory processes (Chen X. et al., 2022).” The relevant literature has been included: Chen X, Cao M, Wang P, Chu S, Li M, Hou P, et al. The emerging roles of TRIM21 in coordinating cancer metabolism, immunity and cancer treatment. Front Immunol 2022;13:968755.

      - The source and the state of the cells (control vs MOG induced) should be stated (Figure 5A).

      In figure 5A to 5D, single-cell RNA-seq were performed from CNS tissues of naive and different phases of EAE mice (peak and chronic). We have added this detailed information in the figure legend of Figure 5.

      - Figure 5D can be placed after 5A. Data in Figure 5A is probably from naive animals, if so, it should be stated in the legend where A is explained. The group details of the data shown in Figure 5 should be clearly stated.

      According to the reviewer’s suggestions, we have placed 5D after 5A. Single-cell RNA seq analysis were performed from CNS tissues of naïve mice and EAE mice. This information is stated in the legend of Figure 5A-D. “Single-cell RNA-seq profiles from naive and EAE mice (peak and chronic phase) CNS tissues. Naive (n=2); peak (dpi 14–24, n=3); chronic (dpi 21–26, n=2).”

      - Immunofluorescence images should be replaced with better quality images, in control images, stainings are not visible.

      We have replaced with better quality images in figure 5H and in control images, the staining is now visible.

      Result 6:

      - Experimental procedures should be given in detail in materials and methods.

      We have revised the section of materials and methods, and more details are added. Detailed information was added for astrocyte isolation, immunoprecipitation. Moreover, mass spectrometry, Hematoxylin-Eosin (HE) and Luxol Fast Blue (LFB) staining, Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes were added in materials and methods.

      Result 7:

      - In Figure 7A, the mean clinical score seems significantly reduced in the shTRIM21-treated group, although it is explained in the result text that it is not significant. Explain to us the difference between Figure 7A and the explaining text?

      Thank you for pointing this out. We sincerely apologize for our carelessness. Based on your comments, we have made the corrections in the manuscript. As there is indeed a statistical difference in the mean clinical scores between shTRIM21-treated group and shVec group, we have accordingly revised the sentence for Figure 7A to state, “At the end time point at day 22 p.i., shTRIM21-treated group showed reduced disease scores compared to control groups (Fig. 7A).” .

      - The staining methods for luxury fast blue and HE are not given in materials and methods.

      According to the reviewer’s comments, we have added the staining methods for HE and LFB in materials and methods.

      - In Figure 7E, authors claim that MBP staining is low in an image, however the image covers approximately 500 um area. One would like to see the demyelinated areas in dashed lines, and also the whole area of the spinal cord sections.

      In Author response image 2, we have added the images for MBP staining of the whole area of spinal cord sections. Demyelinated areas are marked with dashed lines.

      - "TEPP-46 is an allosteric activator that blocks the nuclear translocation of PKM2 by promoting its tetramerization." should be supported by references.

      We have added two references for this sentence. Anastasiou D et al. showed that TEPP-46 acts as an activator by stabilizing subunit interactions and promoting tetramer formation of PKM2. Angiari S et al. showed that TEPP-46 prevented the nuclear transport of PKM2 by promoting its tetramerization in T cells.

      These two references are added:

      Angiari S, Runtsch MC, Sutton CE, Palsson-McDermott EM, Kelly B, Rana N, et al. Pharmacological Activation of Pyruvate Kinase M2 Inhibits CD4(+) T Cell Pathogenicity and Suppresses Autoimmunity. Cell metabolism 2020;31(2):391-405.e8.

      Anastasiou D, Yu Y, Israelsen WJ, Jiang JK, Boxer MB, Hong BS, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nature chemical biology 2012;8(10):839-47.

      - Could you explain what the prevention stage is?

      The term “prevention stage” was used to describe the administration of TEPP-46 before disease onset. To be more accurate, we have revised the phrase from “prevention stage” to “preventive treatment” as described in other references. For example, Ferrara et al. (Ferrara et al., 2020) used “preventive” and “preventive treatment” to mean administration before disease onset.

      The revised sentences are as follows: “To test the effect of TEPP-46 on the development of EAE, the “preventive treatment” (i.e, administration before disease onset) was administered. Intraperitoneal treatment with TEPP-46 at a dosage of 50 mg/kg every other day from day 0 to day 8 post-immunization with MOG35-55 resulted in decreased disease severity (Fig. S8A).”

      - In in vitro experiments, authors used DASA-58, and in vivo they used TEPP-46. What might be the reason that DASA-58 is not applied in vivo?

      The effects of DASA-58 and TEPP-46 in promoting PKM2 tetramerization have been tested in vitro and has been documented. Based on in vitro absorption, distribution, metabolism and excretion profiling studies, Anastasiou et al. predicted that TEPP-46 had better in vivo drug exposure compared to DASA-58. Moreover, TEPP-46, but not DASA-58, is pharmacokinetically validated in vivo (Anastasiou et al., 2012). Thus, we used TEPP-46 for in vivo studies.

      - Authors claim that TEPP-46 activates PKM2 and leads it its nuclear translocation, however, they did not verify PKM2 expression in the nucleus.

      To support that TEPP-46 exerts effects in inhibiting PKM2 nuclear translocation both in vivo and in vitro, we have performed western blotting analysis and immunofluorescence staining. In vitro, TEPP-46 administration inhibited the MOGsup-induced PKM2 nuclear translocation, which exerts similar effects as DASA-58 (Author response image 4). The in vivo effects of TEPP-46 was analyzed by co-immunostaining of PKM2 and GFAP. The results showed reduced nuclear staining of PKM2 in spinal cord astrocytes in TEPP-46-treated EAE mice compared with control EAE mice (Figure S7B).

      Author response image 4.

      TEPP-46 inhibited the nuclear transport of PKM2 in primary astrocytes. Nuclear-cytoplasmic protein extraction analysis showed the nuclear and cytoplasmic changes of PKM2 in TEPP-46 treated astrocytes and MOGsup-stimulated astrocytes. Primary astrocytes were pretreated with 50 μM TEPP-46 for 30 min and stimulated with MOGsup for 24 h.

      Supplementary Figure 3:

      - In Figure 3D, merge should be stated on top of the merged images, it is confusing to the reader.

      According to the reviewer’s comments, we have added merge on top of the merged images.

      Discussion:

      All results should be discussed in detail by interpreting them according to the literature.

      We have further discussed the results in the discussion n section. Firstly, we added a paragraph describing the role of nuclear translocation of PKM2 in diverse CNS diseases. Moreover, a paragraph discussing the nuclear function of PKM2 as a protein kinase or transcriptional co-activator was added. Now the discussion section is more comprehensive, which nearly discuss all the results by interpreting them according to the literature in detail.

      Reviewer #2 (Recommendations For The Authors):

      The authors could address the following points:

      (1) In Figure 1A, the authors present immunofluorescence staining of PKM2 in both control mice and MOG35-725 55-induced EAE mice across different stages of disease progression: onset, peak, and chronic stages. Observing the representative images suggests a notable increase in PKM2 levels, particularly within the nucleus of MOG35-725 55-induced EAE mice. However, to provide a more comprehensive analysis, it would be beneficial for the authors to include statistical data, such as average intensities {plus minus} standard deviation (SD), along with the nuclear PKM2 ratio, akin to the presentation for cultured primary astrocytes in vitro in panels B-D. Additionally, the authors should clearly specify the number of technical repeats and the total number of animals utilized for these data sets to ensure transparency and reproducibility of the findings.

      Thanks for the reviewer’s suggestion. Accordingly, for figure 1A, we have added the nuclear PKM2 ratio in astrocytes in control and different stages of EAE mice in Supplementary figure S1A. Moreover, the quantification of mean fluorescence intensity (MFI) for PKM2 was added in figure S1B. Moreover, we have added the number of animals used in each group in figure legend.

      (2) The blue hue observed in the merged images of Figure 1B (lower panel) presents a challenge for interpretation. The source of this coloration remains unclear from the provided information. Did the authors also include a co-stain for the nucleus in their imaging? To enhance clarity, especially for individuals with color vision deficiency, the authors might consider utilizing different color combinations, such as presenting PKM2 in green and GFAP in magenta, which would aid in distinguishing the two components. Furthermore, for in vitro cell analysis, incorporating a nuclear stain could provide valuable insights into estimating the cytosolic-to-nuclear ratio of PKM2.

      For the question relating to the merged images in figure 1B, PKM2 was presented in green, GFAP was presented in red and blue represents the nuclear staining by DAPI. “Merge” represents the merged images of these three colors. To enhance the clarity, we have added the images for the nuclear staining of DAPI.

      (3) To substantiate the conclusion of the authors regarding the enhancement of aerobic glycolysis due to PKM2 expression and nuclear translocation in MOGsup-stimulated astrocytes, employing supplementary methodologies such as high-resolution respirometry and metabolomics could offer valuable insights. These techniques would provide a more comprehensive understanding of metabolic alterations and further validate the observed changes in glycolytic activity.

      While we recognize the merits of techniques such as high-resolution respirometry and metabolomics, we believe that the conclusions regarding the enhancement of aerobic glycolysis due to PKM2 expression and nuclear translocation in MOGsup-stimulated astrocytes are sufficiently supported by the current experimental evidence. Our study has relied on a robust set of experiments, including lactate production, glucose consumption, cyto-nuclear localization analysis and western blotting analysis of key enzymes in glycolysis. These results, in conjunction with the literature on the role of PKM2 in various cancer cells, keratinocytes and immune cells, provide a strong foundation for our conclusions. Although metabolomics could offer a global view of the changes in metabolic states in astrocytes, as the end product of aerobic glycolysis is lactate, our study, which analyze the change of lactate levels in different experimental conditions might be more direct. However, we fully acknowledge that future studies employing these advanced methodologies could provide further insights into the precise mechanisms underlying PKM2's effects on aerobic glycolysis.

      (4) Minor: Why is the style of the columns different in Gig 2 panel D compared to those shown in panels B, C, and G of Figure 2.

      To maintain consistency in the column style across figure 2, we have updated the column in figure 2D. Now, we use same style of columns in Fig 2B, C, D and G.

      (5) The effect of stimulating astrocytes with MOGsup on cell proliferation, as shown in Figure 2E, is very moderate. Does DASA-58 reduce the proliferation of control cells in this assay?

      In response to the reviewer’s questions, we conducted a CCK8 analysis in astrocytes subjected to DASA-58 treatment. As depicted in Author response image 5, administration of DASA-58 did not reduce the proliferation of control cells. This result aligns with our other findings in the glycolysis assays and EdU analysis, where there is no statistical difference between control group and DASA-58-treated group. One plausible explanation for this is that in their steady state, astrocytes in the control group are not in a hyperproliferative state. Under such conditions, inhibiting the translocation of PKM2 via DASA-58 or other inhibitors did not significantly affect the proliferation of astrocytes.

      Author response image 5.

      CCK8 analysis of astrocyte proliferation. Primary astrocytes were pretreated with 50 μM DASA-58 for 30 min before stimulation with MOGsup. Data are represented as mean ± SEM. ***P<0.001. SEM, standard error of the mean.

      (6) The tables and lists in Figure 4, panels A-D, are notably small, hindering readability and comprehension. Consider relocating these components to the supplementary materials as larger versions.

      We have updated the tables and lists, the lines are made thicker. As suggested by the reviewer, we relocate theses components in Supplementary Figure S5.

      Reviewer #3 (Recommendations For The Authors):

      Higher magnification images that more clearly show nuclear translocation of PKM2 and pp65 and pSTAT3 immunoreactivity should be added to the figures panels, for example as inlets.

      Thank you for pointing out this issue in the manuscript. According to the reviewer’s comments we have included higher magnification images as inlets for Figure 3A, Figure 3B and Figure 2A. These enlarged images now provide a clearer visualization of the nuclear translocation state of PKM2, pp65, and pSTAT3.

      There are seldom wording errors like features => feathers at line 364.

      We are very sorry for our incorrect writing. We have corrected this spelling mistake in the manuscript.

      Reviewer #4 (Recommendations For The Authors):

      Here below are major and minor concerns on the data presented:

      (1) It is not clear from the Methods section what are the culture conditions defined as 'control' in Figure 1B-D. I believe the control should be culturing with the conditioned medium of normal (non-EAE) mice splenocytes to be sure the effect is not from cytokines naturally secreted by these cells.

      Thanks for the reviewer’s comments and we totally understand the reviewer's concern. The control means non-treated primary astrocytes cultured with traditional DMEM medium supplemented with 10% FBS. In fact, we have performed experiments to exclude the possibility that the observed effect of MOGsup on the activation of astrocytes is from cytokines secreted by splenocytes. Splenocytes from normal (non-EAE) mice were isolated, cultured in RPMI-1640 medium containing 10% FBS for 60 hours, and supernatant was collected. Immunofluorescence staining of PKM2 and GFAP were performed in non-treated primary astrocytes and astrocytes stimulated with supernatant from control splenocytes. As shown in Figure S1C, in both groups, no difference was observed in PKM2 expression and localization, PKM2 was located mainly in the cytoplasm in theses conditions. These results indicate that observed effect of PKM2 in MOGsup-stimulated condition is not due to the cytokines secreted from splenocytes. Thus, we used non-treated primary astrocytes as controls in our study. To clarify the control group, we have revised the description in the figure legend, The revised expression is as follows: “Immunofluorescence staining of PKM2 (green) with GFAP (red) in non-treated primary astrocytes (control) or primary astrocytes cultured with splenocytes supernatants of MOG35–55-induced EAE mice (MOGsup) for different time points (6 h, 12 h and 24 h). ”

      (2) Figure 3D: the presence of PMK2 in the nuclear fraction upon MOGSUP together with the DASA-58 (last lane of Figure 3D) is not supporting the hypothesis proposed and further may indicate that the reduction of pSTAT3, pp65, etc. observed is independent of PMK2 nuclear translocation/astrocyte activation being observed even in absence of MOGSUP.

      Thank you for pointing out this problem in manuscript. The representing image of nuclear level of PKM2 in Figure 3D is not obvious, as shown by figure 3D, which has raised doubts among the reviewers. To strengthen our conclusion that the reduction of STAT3 and p65 pathway is related to the inhibited nuclear level of PKM2 induced by DASA-58, nuclear PKM2 level was quantified and added in Figure S4B. From the quantification results, it is evident that DASA-58 administration decreased the nuclear level of PKM2 in MOGsup-stimulated astrocytes. To address this concern, we have updated the immunoblot image for PKM2 in figure 3D and incorporated quantification results in supplementary Figure S4.

      (3) Molecular docking indication and deletion co-immunoprecipitation reported in Figure 4 data are not concordant on TRIM21: N-terminal Phe23 and Thr87 (Figure 4E) predicted by MD to bind PMK2 are not in the PRY-SPRY domain suggested by the co-IP experiment (Figure 4I).

      The discrepancy between the molecular docking prediction and the co-immunoprecipitation can be explained as follows:

      Firstly, molecular docking is computational methods that predicts protein-protein interaction based on 3-D structures of the proteins. However, the accuracy of this predication can be influenced by the different models of 3D structures of TRIM21 and PKM2, as well as by factors such as post-translational modifications and flexibility of the proteins. Proteins in vivo are subject to post-translational modifications that can affect their interactions. These modifications are not fully captured in molecular docking analysis. For example, in our analysis, the predicted N-terminal Phe23 and Thr87 in TRIM21 hold the potential to interact with PKM2 by hydrogen bonds. However, such binding can be influenced by diverse biological environments, such as different cells and pathological conditions. Molecular docking predication may suggest the specific residues and binding pocked within the protein complex, however, the accuracy should be verified by experimental techniques such as immunoprecipitation. To address the predication results of molecular docking, the description has been revised as follows: “TRIM21 is predicted to bound to PKM2 via hydrogen bonds between the amino acids of the two molecules.”

      Co-immunoprecipitation that involves the use of truncated domains of TRIM21 and PKM2, is an experimental technique relies on the specific interaction between antibody and targeted proteins. This technique can provide insights into the precise binding domains between TRIM21 and PKM2. As demonstrated in our study, PRY-SPRY domain of TRIM21 is involved in this binding. In summary, while molecular docking and Co-IP are valuable tools for studying protein-protein interactions, their differing focus and limitations may result in discrepancies between the predicted interaction sites and the experimentally identified interaction domains.

      (4) The Authors state that PMK2 is a substrate of TRIM21 E3 ligase activity, however, this is not proved: i) interaction does not imply a ligase-substrate relationship; ii) the ubiquitination shown in Figure 6C is not performed in denaturing conditions thus the K63-Ub antibody can detect also interacting FLAG-IPed proteins (besides, only a single strong band is seen, not a chain; molecular weights in immunoblot should be indicated); iii) use of a catalytically inactive TRIM21 would be required as well.

      We appreciate the reviewer’s comments regarding the limitations of the immunoprecipitation and K63-antibody test, which could not lead to the conclusion that PKM2 is a substrate of TRIM21. To avoid any misunderstandings, we have revised the relevant sentence from “Hereby, we recognized PKM2 as a substrate of TRIM21” to “Hereby, we recognized PKM2 as an interacting protein of TRIM21, and further studies are required to determine if it is a substrate of E3 ligase TRIM21”. Moreover, we have revised the title of the relevant part in the results section, the previous title, “TRIM21 ubiquitylates and promotes the nuclear translocation of PKM2” has been replaced with “TRIM21 promotes ubiquitylation and the nuclear translocation of PKM2”. Moreover, molecular weights for all proteins in western blotting were indicated.

      (5) As above, molecular weights should always be indicated in immunoblot.

      Thanks for pointing out this problem in the figures. Accordingly, we have added the molecular weights for every protein tested in immunoblot.

      (6) The authors should describe the EAE mouse model in the text and in the material and methods as it may not be so well known to the entire reader audience, and the basic principle of MOG35-55 stimulation, in order to understand the experimental plan meaning.

      We appreciate the reviewer’s comments highlighting the importance of clarifying EAE model for a broader understanding of the reader audience. In response, we have described the EAE model both in the text and in the materials and methods section. In the text, the description of EAE model was added at the beginning of the first paragraph in the Results section. The description is as follows: “EAE is widely used as a mouse model of multiple sclerosis, which is typically induced by active immunization with different myelin-derived antigens along with adjuvants such as pertussis toxin (PTX). One widely used antigen is the myelin oligodendrocyte glycoprotein (MOG) 35-55 peptide (Nitsch et al., 2021), which was adopted in our current studies.”

      We have also added the detailed experimental procedures for EAE induction in the materials and methods section.

      (7) The authors should better explain and give the rationale for the use of splenocytes and why directly activated astrocytes (isolated from the EAE model) cannot be employed to confirm/prove some of the presented data.

      Firstly, splenocytes offer a heterogenous cell population, encompassing T cells and antigen presenting cells (APC), which may better mimic the microenvironment and complex immune responses observed in vivo.

      Myelin oligodendrocyte glycoprotein (MOG) 35-55 peptide is one widely used antigen for EAE induction. MOG35-55 elicits strong T responses and is highly encephalitogenic. Moreover, MOG35-55 induces T cell-mediated phenotype of multiple sclerosis in animal models. Thus, by isolating splenocytes from the onset stage of EAE mice, which contains APC and effector T cells, followed by stimulation with antigen MOG35-55 in vitro for 60 hours, the T-cell response in the acute stage of EAE diseases could be mimicked in vitro. The supernatant from MOG35-55 stimulated splenocytes has high levels of IFN-γ and IL-17A, which in part mimic the pathological process and environment in EAE, and this technique has been documented in the references (Chen et al., 2009, Kozela et al., 2015).

      Correspondingly, we have revised sentence for the use of MOG35-55 stimulates splenocytes in EAE mice and add the relevant references: “Supernatant of MOG35-55-stimulated splenocytes isolated from EAE mice were previously shown to elicit a T-cell response in the acute stage of EAE and are frequently used as an in vitro autoimmune model to investigate MS and EAE pathophysiology (Chen et al., 2009, Du et al., 2019, Kozela et al., 2015).”

      Secondly, activated astrocytes (isolated from the EAE model) can not be employed for in vitro culture for the following reasons:

      (1) Low cell viability. Compared to embryonic or neonatal mice, adult mice yield a limited number of viable cells. The is mainly because that adult tissues possess less proliferative capacity.

      (2) Disease changes. Astrocytes in EAE mice are exposed to microenvironment including inflammatory cytokines, antigens and other pathological factors. Without this environment, the function and morphology of astrocytes undergo changes, which make it difficult to interpret the results in vitro.

      For these reasons, the in vitro cultured primary astrocytes used the neonatal mice.

      (8) The authors should indicate the phosphorylation sites they are referring to when analysing p-c-myc, pSTAT3, pp65, etc...

      According to the reviewer’s suggestions, we have added the phosphorylation sites for pSTAT3 (Y705), pp65 (S536), p-c-myc (S62) and pIKK (S176+S180) in the figure panels.

      (9) Reference of DASA-58 and TEPP-46 inhibitors and their specificity should be given.

      According to the reviewer’s comments, we have added the relevant references for the use of DASA-58 and TEPP-46 as inhibitors of PKM2 nuclear transport. In primary BMDMs, LPS induced nuclear PKM2. However, driving PKM2 into tetramers using DASA-58 and TEPP-46 inhibited LPS-induced PKM2 nuclear translocation (Palsson-McDermott et al., 2015). Consistently, FSTL1 induced PKM2 nuclear translocation was inhibited by DASA-58 in BMDMs (Rao et al., 2022). Accordingly, we have added these references in the manuscript.

      To address the selectivity of TEPP-46 and add the references, the relevant sentence has been revised from “TEPP-46 is an allosteric activator that blocks the nuclear translocation of PKM2 by promoting its tetramerization” to “TEPP-46 is a selective allosteric activator for PKM2, showing little or no effect on other pyruvate isoforms. It promotes the tetramerization of PKM2, thereby diminishing its nuclear translocation (Anastasiou et al., 2012, Angiari et al., 2020).”

      Reviewing Editor (Recommendations For The Authors):

      The reviewing editor would appreciate it if the original blots from the western blot analysis, which were used to generate the final figures, could be provided.

      Thanks for the reviewing editor’s comment, accordingly, we will add the original blots for the western blots analysis.

      References

      Anastasiou D, Yu Y, Israelsen WJ, Jiang JK, Boxer MB, Hong BS, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nature chemical biology 2012;8(10):839-47.

      Escartin C, Guillemaud O, Carrillo-de Sauvage M-A. Questions and (some) answers on reactive astrocytes. Glia 2019;67(12):2221-47.

      Ferrara G, Benzi A, Sturla L, Marubbi D, Frumento D, Spinelli S, et al. Sirt6 inhibition delays the onset of experimental autoimmune encephalomyelitis by reducing dendritic cell migration. Journal of neuroinflammation 2020;17(1):228.

      Lin CC, Edelson BT. New Insights into the Role of IL-1β in Experimental Autoimmune Encephalomyelitis and Multiple Sclerosis. Journal of immunology (Baltimore, Md : 1950) 2017;198(12):4553-60.

      Palsson-McDermott Eva M, Curtis Anne M, Goel G, Lauterbach Mario AR, Sheedy Frederick J, Gleeson Laura E, et al. Pyruvate Kinase M2 Regulates Hif-1α Activity and IL-1β Induction and Is a Critical Determinant of the Warburg Effect in LPS-Activated Macrophages. Cell metabolism 2015;21(1):65-80.Rao J, Wang H, Ni M, Wang Z, Wang Z, Wei S, et al. FSTL1 promotes liver fibrosis by reprogramming macrophage function through modulating the intracellular function of PKM2. Gut 2022;71(12):2539-50.

      Wheeler MA, Clark IC, Tjon EC, Li Z, Zandee SEJ, Couturier CP, et al. MAFG-driven astrocytes promote CNS inflammation. Nature 2020;578(7796):593-9.

      Zhang J, Feng G, Bao G, Xu G, Sun Y, Li W, et al. Nuclear translocation of PKM2 modulates astrocyte proliferation via p27 and -catenin pathway after spinal cord injury. Cell Cycle 2015;14(16):2609-18.

    1. Author response:

      We thank the editor and reviewers for their supportive comments about our modeling approach and conclusions, and for raising several valid concerns; we address them briefly below.

      Concerns about model’s biological realism and impact on interpretations

      The goal of this paper was to use an interpretable and modular model to investigate the impact of varying sensorimotor delays. Aspects of the model (e.g. layered architecture, modularity) are inspired by biology; at the same time, necessary abstractions and simplifications (e.g. using an optimal controller) are made for interpretability and generalizability, and they reflect common approaches from past work. The hypothesized effects of certain simplifying assumptions are discussed in detail in Section 3.5. Furthermore, the modularity of our model allows us to readily incorporate additional biological realism (e.g. biomechanics, connectomics, and neural dynamics) in future work. In the revision, we will add citations and edits to the text to clarify these points.

      Concerns that the model is overly complex

      To investigate the impact of sensorimotor delays on locomotion, we built a closed-loop model that recapitulates the complex joint trajectories of fly walking. We agree that locomotion models face a tradeoff between simplicity/interpretability and realism — therefore, we developed a model that was as simple and interpretable as possible, while still reasonably recapitulating joint trajectories and generalizing to novel simulation scenarios. Along these lines, we also did not select a model that primarily recreates empirical data, as this would hinder generalizability and add unnecessary complexity to the model. We do not think these design choices are significant weaknesses of this model; in fact, few comparable models account for all joints involved in locomotion, and fewer explicitly compare model kinematics with kinematics from data. We will add citations and edits to the text to clarify these points in the revision.

      Concerns about the validity of the Kinematic Similarity (KS) metric to evaluate walking

      We chose to incorporate only the first two PCA modes dimensions in the KS metric because the kernel density estimator performs poorly for high dimensional data. Our primary use of this metric was to indicate whether the simulated fly continues walking in the presence of perturbations. For technical reasons, it is not feasible to perform equivalent experiments on real walking flies, which is one of the reasons we explore this phenomenon with the model. We note the dramatic shift from walking to non-walking as delay increases (Figure 5). To be thorough, in the revision, we will investigate the effect of incorporating additional PCA modes, and whether this affects the interpretation of our results. We will additionally edit the discussion and presentation of the KS metric to clarify its purpose in this study. We agree with the reviewers that the KS metric is too coarse to reflect fine details of joint kinematics; indeed, in the unperturbed case, we evaluate our model’s performance using other metrics based on comparisons with empirical data (Figures 2, 7, 8).

    1. Author response:

      We thank the reviewers for their engagement and constructive comments. This provisional response aims to clarify key misconceptions, address major criticisms, and outline our revision plans.

      A primary concern of the reviewers appears to be our model's limitations in addressing a broad range of empirical findings. This, however, misinterprets our core contribution. Our work centers on a cautionary tale that before advocating for newly discovered cell types and their purported special roles in spatial cognition—an approach prevalent in the field—such claims must be tested against alternative (null) hypotheses that may contradict intuitive expectations. We present such an alternative hypothesis regarding spatial cells and their assumed privileged roles. We show that key findings in the field - spatial “cell types”,  arise in a set of null models without spatial grounding (including untrained variants) despite the models not being a model for spatial processing, and we also found that they had no privileged role for representing spatial information.

      Our proposal is not a new model attempting to explain the brain, and therefore we do not aim to capture every empirical finding. Indeed, we would not expect an object recognition model (and its untrained variant) with no explicit spatial grounding to account for all phenomena in spatial cognition. This underscores our key point: if there exists a basic, spatially agnostic model that can explain certain degrees of empirical findings using criteria from the literature (i.e. place, head-direction and border cells), what implications does this have for the more complex theories and models proposed as underlying mechanisms of special cell types?

      Regarding concerns about the limited scope and generalizability of our setting, we will clarify that we considered multiple DNN architectures, both trained and untrained, on multiple decoding tasks (position, head direction, and nearest-wall distance). We plan to extend our experiments further as detailed in the revision plan below.

      Further, there was a methodological concern about using a linear decoder on a fixed DNN for spatial decoding tasks being a form of "hacking". However, linear readout is standard practice in neuroscience to characterize information available in a neural population. Moreover, our tests on untrained networks also showed spatial decoding capabilities, suggesting it's not solely due to the linear readout.

      For our full revision plan:

      (1) We will revise the manuscript to better reflect these above points, clarifying our paper's stance and improving the writing to reduce misconceptions.

      (2) We will address individual public reviews in more detail.

      (3) We intend to address key reviewer recommendations, focusing on better situating our work within the broader context of the existing literature whilst emphasizing the null hypothesis perspective.

      (4) In general, we will consider additional aspects of the literature and conduct new experiments to strengthen the relevance of our work to existing work. We highlight a number of potential experiments which we believe can address reviewer concerns:

      a. Blurring the visual inputs to DNNs to match rodent perception.

      b. Vary environmental settings to verify whether our findings are more

      generalizable (which we predict to be the case).

      c. Vary the environment to assess remapping effects, which will strengthen the

      connection of our work to the literature.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Federer et al. tested AAVs designed to target GABAergic cells and parvalbumin-expressing cells in marmoset V1. Several new results were obtained. First, AAV-h56D targeted GABAergic cells with >90% specificity, and this varied with serotype and layer. Second, AAV-PHP.eB.S5E2 targeted parvalbumin-expressing neurons with up to 98% specificity. Third, the immunohistochemical detection of GABA and PV was attenuated near viral injection sites.

      Strengths:

      Vormstein-Schneider et al. (2020) tested their AAV-S5E2 vector in marmosets by intravenous injection. The data presented in this manuscript are valuable in part because they show the transduction pattern produced by intraparenchymal injections, which are more conventional and efficient.

      Our manuscript additionally provides detailed information on the laminar specificity and coverage of these viral vectors, which was not investigated in the original studies.

      Weaknesses:

      The conclusions regarding the effects of serotype are based on data from single injection tracks in a single animal. I understand that ethical and financial constraints preclude high throughput testing, but these limitations do not change what can be inferred from the measurements. The text asserts that "...serotype 9 is a better choice when high specificity and coverage across all layers are required". The data presented are consistent with this idea but do not make a strong case for it.

      We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we have tempered our claims about such differences and use more caution in the interpretation of these data (Results p. 6 and Discussion p.10). Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.

      A related criticism extends to the analysis of Injection volume on viral specificity. Some replication was performed here, but reliability across injections was not reported. My understanding is that individual ROIs were treated as independent observations. These are not biological replicates (arguably, neither are multiple injection tracks in a single animal, but they are certainly closer). Idiosyncrasies between animals or injections (e.g., if one injection happened to hit one layer more than another) could have substantial impacts on the measurements. It remains unclear which results regarding injection volume or serotype would hold up had a large number of injections been made into a large number of marmosets.

      For the AAV-S5E2, we made a total of 7 injections (at least 2 at each volume), all of which, irrespective of volume, resulted in high specificity and efficiency for PV interneurons. Our conclusion is that larger volumes are slightly less specific, but the differences are minimal and do not warrant additional injections. Additionally, we kept all the other parameters across animals constant (see new Supplementary Table 1), all of our injections involved all cortical layers, and the ROIs we selected for counts encompassed reporter protein expression across all layers. To provide a better sense of the reliability of the results across injections, in the revised version of the manuscript we now provide results for each of the AAV-S5E2 injection case separately in a new Supplementary Table 2. The results in this table indicate the results are indeed rather consistent across cases with slightly greater specificity for injection volumes in the range of 105-180 nl.

      Reviewer #2 (Public Review):

      This is a straightforward manuscript assessing the specificity and efficiency of transgene expression in marmoset primary visual cortex (V1), for 4 different AAV vectors known to target transgene expression to either inhibitory cortical neurons (3 serotypes of AAV-h56D-tdTomato) or parvalbumin (PV)+ inhibitory cortical neurons in mice. Vectors are injected into the marmoset cortex and then postmortem tissue is analyzed following antibody labeling against GABA and PV. It is reported that: "in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% efficiency, depending on viral serotype and cortical layer. AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency."

      These claims are largely supported but slightly exaggerated relative to the actual values in the results presented. In particular, the overall efficiency for the best h56D vectors described in the results is: "Overall, across all layers, AAV9 and AAV1 showed significantly higher coverage (66.1{plus minus}3.9 and 64.9%{plus minus}3.7)". The highest coverage observed is just in middle layers and is also less than 80%: "(AAV9: 78.5%{plus minus}9.1; AAV1: 76.9%{plus minus}7.4)".

      In the abstract, we indeed summarize the overall data and round up the decimals, and state that these percentages are upper bound but that they vary by serotype and layer while in the Results we report the detailed counts with decimals. To clarify this, in the revised version of the Abstract we have changed 80% to 79% and emphasize even more clearly the dependence on serotype and layer. We have amended this sentence of the Abstract as follows: “We show that in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 79% efficiency, but this depends on viral serotype and cortical layer.”

      For the AAV-PHP.eB-S5E2 the efficiency reported in the abstract (“86-90%) is also slightly exaggerated relative to the results: “Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl.”

      Indeed, the numbers in the Abstract are upper bounds, for example efficiency in L4A/B with S5E2 reaches 90%. To further clarify this important point, in the revised abstract we now state ”AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency, depending on layer”.

      These data will be useful to others who might be interested in targeting transgene expression in these cell types in monkeys. Suggestions for improvement are to include more details about the vectors injected and to delete some comments about results that are not documented based on vectors that are not described (see below).

      Major comments:

      Details provided about the AAV vectors used with the h56D enhancer are not sufficient to allow assessment of their potential utility relative to the results presented. All that is provided is: "The fourth animal received 3 injections, each of a different AAV serotype (1, 7, and 9) of the AAV-h56D-tdTomato (Mehta et al., 2019), obtained from the Zemelman laboratory (UT Austin)." At a minimum, it is necessary to provide the titers of each of the vectors. It would also be helpful to provide more information about viral preparation for both these vectors and the AAVPHP.eB-S5E2.tdTomato. Notably, what purification methods were used, and what specific methods were used to measure the titers?

      We thank the Reviewer for this comment. In the revised version of the manuscript, we now provide a new Supplementary Table 1 with titers and other information for each viral vector injection. We also provide information regarding viral preparation in a new sections in the Methods entitled “ Viral Preparation”  (p12).

      The first paragraph of the results includes brief anecdotal claims without any data to support them and without any details about the relevant vectors that would allow any data that might have been collected to be critically assessed. These statements should be deleted. Specifically, delete: “as well as 3 different kinds of PV-specific AAVs, specifically a mixture of AAV1-PaqR4-Flp and AAV1-h56D-mCherry-FRT (Mehta et al., 2019), an AAV1-PV1-ChR2-eYFP (donated by G. Horwitz, University of Washington),” and delete “Here we report results only from those vectors that were deemed to be most promising for use in primate cortex, based on infectivity and specificity. These were the 3 serotypes of the GABA-specific pAAV-h56D-tdTomato, and the PV-specific AAVPHP.eB-S5E2.tdTomato.” These tools might in fact be just as useful or even better than what is actually tested and reported here, but maybe the viral titer was too low to expect any expression.

      These data are indeed anecdotal, but we felt this could be useful information, potentially preventing other primate labs from wasting resources, animals and time, particularly, as some of these vectors have been reported to be selective and efficient in primate cortex, which we have not been able to confirm. We made several injections in several animals of those vectors that failed either to infect a sufficient number of cells or turned out to be poorly specific. Therefore, the negative results have been consistent in our hands. But we agree with the Reviewer that our negative results could have depended on factors such as titer. In the revised version of the manuscript, following the reviewer’s suggestion, we have deleted this information.

      Based on the description in the Methods it seems that no antibody labeling against TdTomato was used to amplify the detection of the transgenes expressed from the AAV vectors. It should be verified that this is the case - a statement could be added to the Methods.

      That is indeed the case. We used no immunohistochemistry to enhance the reporter proteins as this was unnecessary. The native/ non-amplified tdT signal was strong. This is now stated in the methods (p.12).

      Reviewer #3 (Public Review):

      Summary:

      Federer et al. describe the laminar profiles of GABA+ and of PV+ neurons in marmoset V1. They also report on the selectivity and efficiency of expression of a PV-selective enhancer (S5E2). Three further viruses were tested, with a view to characterizing the expression profiles of a GABA-selective enhancer (h56d), but these results are preliminary.

      Strengths:

      The derivation of cell-type specific enhancers is key for translating the types of circuit analyses that can be performed in mice - which rely on germline modifications for access to cell-type specific manipulation - in higher-order mammals. Federer et al. further validate the utility of S5E2 as a PV-selective enhancer in NHPs.

      Additionally, the authors characterize the laminar distribution pattern of GABA+ and PV+ cells in V1. This survey may prove valuable to researchers seeking to understand and manipulate the microcircuitry mediating the excitation-inhibition balance in this region of the marmoset brain.

      Weaknesses:

      Enhancer/promoter specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      This is an important point that was also brough up by Reviewer 1, which we have addressed in our reply-to-Reviewer 1. For clarity and convenience, below we copy our response to Reviewer 1.

      “We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we will temper our claims about such differences and use more caution in the interpretation of these data. Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 would have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.”

      The language used throughout conflates the cell-type specificity conferred by the regulatory elements with that conferred by the serotype of the virus.

      Authors’ reply. In the revised version of the manuscript, we have corrected ambiguous language throughout.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      My Public Review comments can be addressed by dialing down the interpretation of the data or providing appropriate caveats in the presentation of the relevant results and their discussion.

      We have done so. See text additions on p. 6 of the Results and p.10 of the Discussion.

      Minor comments:

      92% of PV+ neurons in the marmoset cortex were GABAergic. Can the authors speculate on the identity of the 8% PV+/GABA- neurons (e.g., on the basis of morphology)? Are they likely excitatory? Are they more likely to represent failures of GABA staining?

      We do not know what the other 8% of PV+/GABA- neurons are because we did not perform any other kind of IHC staining. Our best guess is that at least to some extent these represent failures of GABA staining, which is always challenging to perform in primate cortex. However, in mouse PV expression has been demonstrated in a minority of excitatory neurons.

      "Coverage of the PV-AAV was high, did not depend on injection volume.." The fact that the coverage did not depend on injection volume presumably depends, at least in part, on how ROIs were selected. Surely different volumes of injection transduce different numbers of neurons at different distances from the injection track. This should be clarified.

      The ROIs were selected at the center of the injected site/expression core from sections in which the expression region encompassed all cortical layers. Of course, larger volumes of injection resulted in larger transduced regions and therefore overall larger number of transduced neurons, but we counted cells only withing 100 µm wide ROIs at the center of the injection and the percent of transduced PV cells in this core region did not vary significantly across volumes. We have clarified the methods of ROI selection (see Methods pp. 13).

      Figure 2. What is meant by “absolute” in the legend for Figure 2? (How does “mean absolute density” differ from “mean density?”)

      We meant not relative, but this is obvious from the units, so we have removed the word “absolute” in the legend.

      Some non-significant p-values are indicated by "p>0.05" whereas others are given precisely (e.g., p = 1). Please provide precise p-values throughout. Also, the p-value from a surprisingly large number of comparisons in the first section of the results is "1". Is this due to rounding? Is it possible to get significance in a Bonferroni-corrected Kruskal-Wallis test with only 6 observations per condition?

      We now report exact p values throughout the manuscript (with a couple of exceptions where, in order to avoid reporting a large number of p values which interrupts the flow of the manuscript) we provide the upper bound value and state all those comparisons were below that value). The minimum sample size for Kruskall Wallis is 5, for each group being compared, and we our sample is 6 per group.

      Figure 3: The density of tdTomato-expressing cells appears to be greater at the AAV9 injection site than at the AAV1 injection site in the example sections shown. Might some of the differences between serotypes be due to this difference? I would imagine that resolving individual cells with certainty becomes more difficult as the amount of tdTomato expression increases.

      There was an error in the scale bar of Fig. 3C, so that the AAV1 injection site was shown at higher magnification than indicated by the wrong scale bar. Hence the density of tdTomato appeared lower than it is. Moreover, the tdT expression region shown in Fig. 3A is a merge of two sections, while it is only from a single section in panels B and C, leading to the impression of higher density of infected cells in panel A. The pipette used for the injection in panel A was not inserted perfectly vertical to the cortical surface, resulting in an injection site that did not span all layers in a single section; thus, to demonstrate that the injection indeed encompassed all layers (and that the virus infected cells in all layers), we collapsed label from two sections. We have now corrected the magnification of panel C so that it matches the scale bar in panel A, and specify in the figure legend that panel A label is from two sections.

      Text regarding Figure 3: The term “injection sizes” is confusing. I think it is intended to mean “the area over which tdTomato-expressing cells were found” but this should be clarified.

      Throughout the manuscript, we have changed the term injection site to “viral-expression region”.

      Figure 3: What were the titers of the three AAV-h56D vectors?

      Titers are now reported in the new Supplementary Table 1.

      Figure 3: The yellow box in Figure 3C is slightly larger than the yellow boxes in 3A and 3B. Is this an error or should the inset of Figure 3 have a scale bar that differs from the 50 µm scale bar in 3A?

      There were indeed errors in scale bars in this figure, which we have now corrected. Now all boxes have the same scale bar.

      Was MM423 one of the animals that received the AAV-h56D injections or one of the three that received AAV-S5E2 injection?

      This is an animal that received a 315nl injection of AAV-PHP.eB-S5E2.tdTomato. This is now specified in the Methods (see p. 12) and in the new Supplementary Table 1.

      Please provide raw cell counts and post-injection survival times for each animal.

      We now provide this information in Supplementary Tables 1 and 2.

      How were the different injection volumes of the AAV-S5E2 virus arranged by animal? Which volume of the AAV-S5E2 virus was injected into the two animals who received single injections?

      We now provide this information in Supplementary Table 1.

      Figure 6A: the point is made in the text that "[the distribution of tdT+ and PV+ neurons] did not differ significantly... peaking in L2/3 and 4C " Is the fact that the number of tdT+ and PV+ peak in layers 2/3 and 4C a consequence of these layers being thicker than the others? If so, this statement seems trivial.

      No, and this is the reason why we measured density in addition to percent of cells across layers in Figure 2. Figure 2B shows that even when measuring density, therefore normalizing by area, GABA+ and PV+ cell density still peaks in L2/3 and 4. Thus, these peaks do not simply reflect the greater thickness of these layers.

      Do the authors have permission to use data from Xu et al. 2010?

      Yes, we do.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      "Viral strategies to restrict gene expression to PV neurons have also been recently developed (Mehta et al., 2019; Vormstein-Schneider et al., 2020)." Mich et al. should also be cited here. Cell Rep. 2021;34(13):108754.

      We thank the reviewer for pointing out this missing references. This is now cited.

      “GABA density in L4C did not differ from any other layers, but the percent of GABA+ cells in L4C was significantly higher than in L1 (p=0.009) and 4A/B (p=<0.0001).” This and other similar observations depend on calculating the percentage of cells relative to the total number of DAPI-labeled cells in each layer. Since it is apparent that there must be considerable variability between layers, it would be helpful to add a histogram showing the densities of all DAPI-labeled cells for each layer.

      This is not how we calculated density. Density, as now clarified in the Results on p. 4, was defined as the number of cells per unit area. Counts in each layer were divided by each layers’ counting area. This corrects for differences in number of total labeled cells per layer. Therefore, reporting DAPI density is not necessary (we did not count DAPI cell density per layer).

      "Identical injection volumes of each serotype, delivered at 3 different cortical depths (see Methods), resulted in different injection sizes, suggesting the different serotypes have different capacity of infecting cortical neurons. AAV7 produced the smallest injection site, which additionally was biased to the superficial and deep layers, with only few cells expressing tdT in the middle layers (Fig. 3B). AAV9 (Fig. 3A) and AAV1 (Fig. 3C) resulted in larger injection sites and infected all cortical layers." Differences noted here might reflect either differences related to the AAV serotype or to differences in titers. Please add details about titers for each vector and add comments as appropriate. Another interpretation would be that there are differences in viral spread within the tissue.

      We have now added Supplementary Table 1 which reports titers in addition to other information about injections. The titers and volumes used for AAV9 and AAV7 were identical, while the titer for AAV1 was higher. Therefore, the differences in infectivity, particularly the much smaller expression region obtained with AAV7 cannot be attributed to titer. Likely this is due to differences in tropism and/or viral spread among serotypes. This is now discussed (see Results p. 5bottom and 6 top).

      “Recently, several viral vectors have been identified that selectively and efficiently restrict gene expression to GABAergic neurons and their subtypes across several species, but a thorough validation and characterization of these vectors in primate cortex has lacked.” Is this really a fair statement, or is the characterization presented here also lacking? Methods used by others for quantifying specificity and efficiency are essentially the same as used here. See for example Mich et al. (which is not cited).

      The original validation in primates of the vectors examined in our study was based on small tissue samples and did not examine the laminar expression profile of transgene expression induced by these enhancer-AAVs. For example, the validation of the h56D-AAV in marmoset cortex in the original paper by Mehta et al (2019) was performed on a tissue biopsy with no knowledge of which cortical layers were included in the tissue sample. The only study that shows laminar expression in primate cortex (Mich et al., which is now cited), only shows qualitative images of viral expression across layers, reporting total specificity and coverage pooled across samples; moreover, the study by Mich et al.  deals with different PV-specific enhancers than the ones characterized in our study. Unlike any of the previous studies, here we have quantified specificity and coverage across layers.

      "Specifically, we have shown that the GABA-specific AAV9-h56D (Mehta et al., 2019) induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% coverage, and the PV-specific AAV-PHP.eB-S5E2 (Vormstein-Schneider et al., 2020) induces transgene expression in PV cells with up to 98% specificity and 86-90% coverage." These statements in the discussion repeat the somewhat exaggerated coverage numbers noted above for the Abstract.

      The averages across all layers are reported in the Results. The Discussion, abstract and discussion report upper limits, and this is made clear by stating “up to”, and now we have also added “depending on layer”.

      Reviewer #3 (Recommendations For The Authors):

      Abstract:

      • Ln 2: Can you be more specific about what you mean by the 'various functions of inhibition'? e.g. do you mean 'the various inhibitory influences on the local microcircuit' or similar?

      These are listed in the introduction to the paper but there is no space in the abstract to do so. Now the sentence reads: “various computational functions of…”.

      • Ln 5: 'has' to 'is'/'has been'.

      The grammar here is correct “has derived”.

      • Ln 6: humans are primates! Maybe change this to 'nonhuman primates'?

      We have added “non-human”

      • Ln n-1: 'viral vectors represent' -> 'viral vectors are'.

      We have changed it to “are”

      Intro:

      • Many readers may expect 'VIP' to be listed as the third major sub-class of interneurons. Could you note that the 5HT3a receptor-expressing group includes VIP cells?

      Done (p.3).

      • "Understanding cortical inhibitory neuron function in the primate is critical for understanding cortical function and dysfunction in the model system closest to humans" - this seems close to being circular logic (not quite, but close). Could you modify this sentence to reflect why understanding cortical function and dysfunction in NHP may be of interest?

      This sentence now reads (p.3):” Understanding cortical inhibitory neuron function in the primate is critical for understanding cortical function and dysfunction in the model system closest to humans, where cortical inhibitory neuron dysfunction has been implicated in many neurological and psychiatric disorders, such as epilepsy, schizophrenia and Alzheimer’s disease (Cheah et al., 2012; Verret et al., 2012; Mukherjee et al., 2019)”. We also note that this was already stated in the previous version of the paper but in the Discussion section which read (and still reads on p. 9 2nd paragraph): “It is important to study inhibitory neuron function in the primate, because it is unclear whether findings in mice apply to higher species, and inhibitory neuron dysfunction in humans has been implicated in several neurological and psychiatric disorders (Marin, 2012; Goldberg and Coulter, 2013; Lewis, 2014).”.

      • "In particular, two recent studies have developed recombinant adeno-associated viral vectors (AAV) that restrict gene expression to GABAergic neurons". This sentence places the emphasis on the wrong component of the technology. The fact that AAV was used is irrelevant; these constructs could equally have been packaged in a lenti, CAV, HSV, rabies, etc. The emphasis should be on the recently developed regulatory elements (the enhancers/promoters).

      Same problem with the following excerpts; this text implies that the serotype/vector confers cell-type selectivity, but the results presented do not support this assertion (the promoter/enhancer is what confers the selectivity).

      • "specifically, three serotypes of an AAV that restricts gene expression to GABAergic neurons".

      • "one serotype of an AAV that restricts gene expression to PV cells".

      • "GABA- and PV-specific AAVs".

      • "GABA-specific AAV" (in results).

      • "PV-specific AAVs".

      • "In this study, we have characterized several AAV vectors designed to restrict expression to GABAergic cells" (in discussion).

      • "GABA-virus". GABA is a NT, not a virus.

      We have modified the language in all these sections and throughout the manuscript.

      Results:

      • Enhancer specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      We agree, and in fact we are not making comparisons between different enhancers (i.e., S5E2 and h56D).

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      The authors need to either: (1) replicate the h56D virus injections in (at least) a second animal, or (2) rewrite the paper to focus on the AAV.PhP mDlx virus alone - for which they have adequate data - and mention the h56D data as an anecdotal result, with clear warnings about the preliminary nature of the observations due to lack of replication.

      We agree about the lack of sufficient data to make strong statements about the differences between serotypes for the h56D-AAV. In the revised version of the manuscript, following the Reviewers’ suggestion, we have chosen to temper our claims about differences between serotypes for the h56D enhancer and use more caution in the interpretation of these data. We feel that these data still demonstrate sufficiently high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested, to warrant their use in primates. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species. Our edits in regard to this point can be found in the Results on p. 6 and Discussion on p. 10.

      • Did the authors compare h56D vs mDlx? This would be a useful and interesting comparison.

      We did not.

      • 3 tissue sections were used for analysis. How were these selected? Did the authors use a stereological approach?

      For the analysis in Fig. 2, the 3 sections were randomly selected and for the positioning of the ROIs we selected a region in dorsal V1 anterior to the posterior pole  (to avoid laminar distortions due to the curvature of the brain). This is now specified (see p. 4).

      • "both GABA+ and PV+ cells peak in layers" revise for clarity (e.g., the counts peak).

      In now reads “GABA+ and PV+ cell percent and density” (see p.4).

      • "we refer to this virus as GABA-AAV" these are 3 different viruses!

      The idea here was to use an abbreviation instead of using the full viral name every single time. Clearly the reviewer does not like this, so we have removed this convention throughout the paper and now specify the entire viral name each time.

      • "Identical injection volumes of each serotype, delivered at 3 different cortical depths (see Methods), resulted in different injection sizes". Do you mean 'resulted in different volumes of expression'?

      Yes. We have now rephrased this as follows: “…resulted in viral expression regions that differed in both size as well as laminar distribution” (p.5).

      • “suggesting the different serotypes have different capacity of infecting cortical neurons”. You can’t draw any firm conclusions from a single injection. The rest of this section of the results, along with the whole of Figure 4, and Figure 7a-d, is in danger of being misleading. Please remove. The best you can do here is to say ‘we injected 3 different viruses that express reporter under the h56D promoter. The results are shown in Figure 3, but these are anecdotal, as only a single injection of each virus was performed’. You could then note in the discussion to what extent these results are consistent with the existing literature (e.g., AAV9 often produces good coverage in NHP – anterograde and retrograde, AAV1 also works well in the CNS, although generally doesn’t infect as aggressively as AAV9. I’m not familiar with any attempts to use AAV7).

      With respect to Fig. 4, our approach in the revised version is detailed above. For convenience we copy it below here. With respect to Fig 7A-D, we feel the results are more robust as the data from the 3 serotypes here were pooled together, as the 3 serotype similarly downregulated GABA and PV expression at the injection site, and we do not make any statement about differences among serotypes for the data shown in Fig. 7A-D.

      “In the revised version of the manuscript, following the Reviewer ’s suggestion, we have chosen to temper our claims about differences between serotypes for the h56D enhancer and use more caution in the interpretation of these data (see revised text in the Results on p. 6 and in the Discussion on p. 10). We feel that these data still demonstrate sufficiently high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested, to warrant their use in primates. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.”

      • Figure 3: why the large variation in tissue quality? Are the 3 upper images taken at the same magnification? If not, they need different scale bars. The cells in A (upper row) look much smaller than those in B and C, and the size of the 'inset' box varies.

      We thank the reviewer for noticing this. We discovered an error in the scale bar of Fig. 3C, so that the AAV1 injection site was shown at higher magnification than indicated by the wrong scale bar. We have now corrected the error in scale bars. We have also fixed the different box sizes.

      • "Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl." Coverage didn't differ between layers, so revise this to: "Overall, across all layers coverage ranged from 78% to 81.6%." or give an overall mean (~80%).

      We have corrected the sentence as suggested by the Reviewer (see p. 8 first paragraph).

      • "extending farther from the borders" -> "extending beyond the borders".

      We have corrected the sentence as suggested by the Reviewer (see p. 8).

      • "The reduced GABA and PV immunoreactivity caused by the viruses implies that the specificity of the viruses we have validated in this study is likely higher than estimated". Yes, but for balance you should also note that they may harm the physiology of the cell.

      We have added a sentence acknowledging this to the Discussion. Specifically, on p. 10, we now state: “However, this reduced immunoreactivity raises concerns about the virus or high levels of reporter protein possibly harming the cell physiology.”

      Discussion:

      • "but a thorough validation and characterization of these vectors in primate cortex has lacked" better to say "has been limited", because Dimidschstein 2016 (marmoset V1) and Vormstein-schneider 2020 (macaque S1 and PFC) both reported expression in NHP.

      We have added the following sentence to this paragraph of the Discussion. “In particular, previous studies have not characterized the specificity and coverage of these vectors across cortical layers.”(see p. 8).

      • "whether finding in mice" -> 'whether findings in mice'.

      Corrected, thanks.

      • The discussion re: species differences is missing reference to Kreinen 2020 (10.1038/s41586-020-2781-z).

      This reference has been added. Thanks.

      • “Injections of about 200nl volume resulted in higher specificity (95% across layers) and coverage” – this is misleading. The coverage was not statistically different among injection volumes.

      We have added the following sentence: ”although coverage did not differ significantly across volumes.” (see p. 10).

      • "it is possible that subtle alteration of the cortical circuit upon parenchymal injection of viruses (including AAVs) leads to alteration of activity-dependent expression of PV and GABA." Or (and I would argue, more likely) the expression of large quantities of your big reporter protein compromised the function of the cell, leading to reduced expression of native proteins. You don't mention any IHC to amplify the RFP signal, so I'm assuming that your images are of direct expression. If so, you are expressing A LOT of reporter protein.

      We have added a sentence acknowledging this to the Discussion. Specifically, on p. 10, we now state: “However, this reduced immunoreactivity raises concerns about the virus or high levels of reporter protein possibly harming the cell physiology.”

      Methods:

      • It's difficult to piece together which viruses were injected in which monkeys, at what volumes, and at what titer. Please compile this info into a table for ease of reference (including any other relevant parameters).

      We now provide a Supplementary Table 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this manuscript characterize new anion conducting that is more red-shifted in its spectrum than prior variants called MsACR1. An additional mutant variant of MsACR1 that is renamed raACR has a 20 nm red-shifted spectral response with faster kinetics. Due to the spectral shift of these variants, the authors proposed that it is possible to inhibit the expression of MsACR1 and raACR with lights at 635 nm in vivo and in vitro. The authors were able to demonstrate some inhibition in vitro and in vivo with 635 nm light. Overall the new variants with unique properties should be able to suppress neuronal activities with red-shifted light stimulation.

      Strengths:

      The authors were able to identify a new class of anion conducting channelrhodopsin and have variants that respond strongly to lights with wavelength >550 nm. The authors were able to demonstrate this variant, MsACR1, can alter behavior in vivo with 635 nm light. The second major strength of the study is the development of a red-shifted mutant of MsACR1 that has faster kinetics and 20 nm red-shifted from a single mutation.

      Weaknesses:

      The red-shifted raACR appears to work much less efficiently than MsACR1 even with 635 nm light illumination both in vivo (Figure 4) and in vitro (Figure 3E) despite the 20 nm red-shift. This is inconsistent with the benefits and effects of red-shifting the spectrum in raACR. This usually would suggest raACR either has a lower conductance than MsACR1 or that the membrane/overall expression of raACR is much weaker than MsACR1. Neither of these is measured in the current manuscript.

      Thank you for addressing this crucial issue. We posit that the diminished efficiency of raACR in comparison to MsACR1 WT can be attributed to the tenfold acceleration of its photocycle. As noted by Reviewer 1, the anticipated advantages associated with a red-shifted opsin, particularly in in vivo preparations, are offset by its accelerated off-kinetics. Consequently, the shorter dwell time of the open state leads to a reduced number of conducted ions per photon. Nevertheless, the operational light sensitivity is not drastically altered compared to MsACR WT (Fig. 3C). We believe that the rapid kinetics offer interesting applications, such as the precise inhibition of single action potentials through holography.

      There are limited comparisons to existing variants of ACRs under the same conditions in the manuscript overall. There should be more parallel comparison with gtACR1, ZipACR, and RubyACR in identical conditions in cultured cell lines, cultured neurons, and in vivo. This should be in terms of overall performance, efficiency, and expression in identical conditions. Without this information, it is unclear whether the effects at 635 nm are due to the expression level which can compensate for the spectral shift.

      We compared MsACR1 and raACR with GtACR1 in ND cells in supplemental figure 4. We concur that further comparisons could be useful to emphasise both the strengths of MsACRs and applications where they may not be as suitable. We are currently in the process of outlining a separate article. We firmly believe that each ACR variant occupies a distinct application niche, which necessitates a more comprehensive electrophysiological comparison to provide valuable insights to the scientific community.

      There should be more raw traces from the recordings of the different variants in response to short pulse stimulation and long pulse stimulation to different wavelengths. It is difficult to judge what the response would be like when these types of information are missing.

      We appreciate Reviewer 1's feedback and have compiled a collection of raw photoresponses, encompassing various pulse widths and wavelengths, which can be found in the Supplementary materials (Supplementary Figures 4 and 5).

      Despite being able to activate the channelrhodopsin with 635 nm light, the main utility of the variant should be transcranial stimulation which was not demonstrated here.

      We concur with Reviewer 1's assessment that MsACR prime application is indeed transcranial stimulation. However, it's worth emphasising that the full advantages of transcranial optical stimulation become most apparent when animals are truly freely moving without any tethered patch cords. Our ongoing research in the laboratory is dedicated to the development of a wireless LED system that can be securely affixed to the animal's skull. We aim to demonstrate the potential of these novell optogenetic approaches in the field of behavioural neuroscience in the coming year.

      Figure 3B is not clearly annotated and is difficult to match the explanation in the figure legend to the figure. The action potential spikings of neurons expressing raACR in this panel are inhibited as strongly as MsACR1.

      We have enhanced the figure caption and annotations for clarity. The traces presented in Figure 3B are intended to demonstrate the overall effectiveness of each variant. However, it is in the population data analysis, as depicted in Figure 3E, where the meaningful insights are revealed.

      For many characterizations, the number of 'n's are quite low (3-7).

      We acknowledge Reviewer 1's suggestion regarding the in vivo data and agree with the importance of including more animals, as well as control animals. However, we are committed to adhering to the principles of the 3Rs (Replacement, Reduction, Refinement) in animal research, and given the robustness of our observed effects, we will add animals to reach the minimal number of animals per condition (n = 2) to minimise unnecessary animal usage while ensuring statistical power.

      We will continue to adhere to the established standards in the field, aiming for a range of 3 to 7 cells per condition, sourced from at least two independent preparations, to ensure the robustness and reliability of our in vitro data.

      Reviewer #2 (Public Review):

      Summary:

      The authors identified a new chloride-conducting Channelrhodopsin (MsACR1) that can be activated at low light intensities and within the red part of the visible spectrum. Additional engineering of MsACR1 yielded a variant (raACR1) with increased current amplitudes, accelerated kinetics, and a 20nm red-shifted peak excitation wavelength. Stimulation of MsACR1 and raACR1 expressing neurons with 635nm in mice's primary motor cortices inhibited the animals' locomotion.

      Strengths:

      The in vitro characterization of the newly identified ACRs is very detailed and confirms the biophysical properties as described by the authors. Notably, the ACRs are very light sensitive and allow for efficient in vitro inhibition of neurons in the nano Watt/mm^2 range. These new ACRs give neuroscientists and cell biologists a new tool to control chloride flux over biological membranes with high temporal and spatial precision. The red-shifted excitation peaks of these ACRs could allow for multiplexed application with blue-light excited optogenetic tools such as cation-conducting channelrhodopsins or green-fluorescent calcium indicators such as GCaMP.

      Weaknesses:

      The in-vivo characterization of MsACR1 and raACR1 lacks critical control experiments and is, therefore, too preliminary. The experimental conditions differ fundamentally between in vitro and in vivo characterizations. For example, chloride gradients differ within neurons which can weaken inhibition or even cause excitation at synapses, as pointed out by the authors. Notably, the patch pipettes for the in vitro characterization contained low chloride concentrations that might not reflect possible conditions found in the in vivo preparations, i.e., increasing chloride gradients from dendrites to synapses.

      We appreciate Reviewer 2’s feedback regarding missing control experiments. We will respond to these concerns in another section of our manuscript, as suggested.

      Regarding the chloride gradient, we understand the concerns of Reviewer 2, yet we chose these ionic conditions, particularly as they were used in the initial electrical characterization of GtACR1 in a neuronal context (Mahn et al., 2016). We will make sure to provide this context in our manuscript to justify our choice of ionic conditions.

      Interestingly, the authors used soma-targeted (st) MsACR1 and raACR1 for some of their in vitro characterization yielding more efficient inhibition and reduction of co-incidental "on-set" spiking. Still, the authors do not seem to have utilized st-variants in vivo.

      At the time of submission, due to the long-term absence of our lab technician, we were not able to produce purified viruses. Therefore, we decided to move on with the submission. We now produced the virus externally, and will provide the experiments.

      Most importantly, critical in vivo control experiments, such as negative controls like GFP or positive controls like NpHR, are missing. These controls would exclude potential behavioral effects due to experimental artifacts. Moreover, in vivo electrophysiology could have confirmed whether targeted neurons were inhibited under optogenetic stimulations.

      We have several non-injected control animals that we used to calibrate this particular paradigm and never saw similar responses. However, we acknowledge the suggestion of Reviewer 2 and will include the GFP-injected control as recommended.

      Some of these concerns stem from the fact that the pulsed raACR stimulation at 635 nm at 10Hz (Fig. 3E) was far less efficient compared to MsACR1, yet the in vivo comparison yielded very similar results (Fig. 4D).

      As outlined previously, the accelerated photocycle of raACR results in a reduction in photocurrent amplitude, consequently diminishing the potency of inhibition per photon. In the context of in vitro stimulation, where single action potentials are recorded, this reduction in inhibition efficiency is resolved. However, in the realm of in vivo behavioural analysis, the observed effect is not contingent on single action potentials but rather stems from the disruption of the entire M1 motor network. In this context, despite the reduced efficiency of the fast-cycling raACR, it still manages to interrupt the M1 network, leading to similar behavioural outcomes.

      Also, the cortex is highly heterogeneous and comprises excitatory and inhibitory neurons. Using the synapsin promoter, the viral expression paradigm could target both types and cause differential effects, which has not been investigated further, for example, by immunohistochemistry. An alternative expression system, for example, under VGLUT1 control, could have mitigated some of these concerns.

      Indeed, we acknowledge the limitations of our current experimental approach. We are in the process of planning and conducting additional experiments involving cre-dependent expression of st-MSACR and st-raACR in PV-Cre mice.

      Furthermore, the authors applied different light intensities, wavelengths, and stimulation frequencies during the in vitro characterization, causing varying spike inhibition efficiencies. The in vivo characterization is notably lacking this type of control. Thus, it is unclear why the 635nm, 2s at 20Hz every 5s stimulation protocol, which has no equivalent in the in vitro characterization, was chosen.

      We appreciate the valuable comment from the reviewer. The objective of our in vitro characterization is to elucidate the general effects of specific stimulation parameters on the efficiency of neuronal inhibition. For instance, we aim to demonstrate that lower light intensities result in less efficient inhibition, or that pulse stimulation may lead to a less complete inhibition, albeit significantly reducing the energy input into the system.

      In the in vivo characterization, we face constraints such as animal welfare considerations and limitations in available laser lines, which prevent us from exploring the entire parameter space as comprehensively as in the in vitro preparation. Additionally, it is important to note that membrane capacitance tends to be higher in vivo compared to dissociated hippocampal neurons. Consequently, we have opted for a doubled stimulation frequency from 10 Hz to 20 Hz and the stimulation pattern of 2 seconds ”on” and 5 seconds “off”. This approach allows the animals to spend less time in an arrested state while still demonstrating the effect of MsACR and variants.

      In summary, the in vivo experiments did not confirm whether the observed inhibition of mouse locomotion occurred due to the inhibition of neurons or experimental artifacts.

      In addition, the author's main claim of more efficient neuronal inhibition would require them to threshold MsACR1 and raACR1 against alternative methods such as the red-shifted NpHR variant Jaws or other ACRs to give readers meaningful guidance when choosing an inhibitory tool.

      The light sensitivity of MsACR1 and raACR1 are impressive and well characterized in vitro. However, the authors only reported the overall light output at the fiber tip for the in vivo experiments: 0.5 mW. Without context, it is difficult to evaluate this value. Calculating the light power density at certain distances from the light fiber or thresholding against alternative tools such as NpHR, Jaws, or other ACRs would allow for a more meaningful evaluation.

      We thank the reviewers for their comments.

      Reviewer #1 (Recommendations For The Authors):

      The study would be much strengthened if the authors can perform more experiments and characterization to support their claims, in addition to showing more raw electrophysiological traces/results and not just summary charts and graphs.

      As outlined above, further experiments are planned. We appreciate the suggestion to include more raw electrophysiological traces. Photocurrent traces of all included mutants of MsACR1 measured in ND cells and traces of hippocampal neuronal measurements of non- and soma-targeted MsACR1 and raACR will be included as supplemental figures.

      Reviewer #2 (Recommendations For The Authors):

      Major concern:

      It is unclear if the optogenetic light stimulation in Fig. 4 caused direct inhibition of neuronal activity in M1, which cell types were targeted, and how MsACR1 and raACR1 compare to other optogenetic inhibitors.

      Also, the rationale for the light stimulation (635 nm, 2s, 20Hz, every 5s) is not clear.

      I would suggest the following to address these concerns:

      (1) M1 expression and stimulation of a negative control such as GFP to exclude that experimental artifacts cause the observed behavioral outcomes.

      We are now preparing the required GFP control, and will add it to the new version of the manuscript.

      (2) Expression and stimulation of NpHR as a positive control.

      We will use st-GtACR1 as a positive control.

      (3) Electrophysiological measurements of neuronal activity under optogenetic stimulation to confirm the effectiveness of neuronal inhibition, i.e. suppression of spontaneous firing under light etc.

      We concur with Reviewer 2 regarding the potential value of incorporating such in vivo optrode recordings into our manuscript to enable readers to assess the effectiveness of MsACR. As part of our plan for the next version of the manuscript, we intend to conduct these experiments.

      (4) ChR2 or other cation-conducting channelrhodopsins with the same expression paradigm could be used to observe diametrically opposite effects.

      As Reviewer 2 has already pointed out, the complex interactions that can occur in our viral strategy when an inhibitory opsin is expressed in both excitatory and inhibitory neurons make us sceptical about the possibility of an excitatory opsin leading to opposing effects.

      Considering the non-linear input-output function of cortical circuits, optogenetic activation of neurons, even when expressed in either inhibitory or excitatory neurons, is likely to result in the perturbation of the cortical network, which will likely also lead to locomotor arrest.

      (5) The authors should confirm whether the expression under synapsin preferentially targeted excitatory and inhibitory cells because inhibiting inhibitory cells could lead to the disinhibition of the principal cells. Synapsin promoters can drive expression in glutamatergic and GABAergic neurons. An alternative expression system under VGLUT1 promoter could yield better targeting.

      We concur with Reviewer 2 and will conduct the next set of experiments using the PV-Cre mouse line. Additionally, we will employ in vivo electrophysiology to further confirm the inhibition of the motor cortex network.

      (6) Titrating of optogenetic stimulation: The author should test whether increasing or decreasing light intensities and stimulation frequencies as well as different wavelengths (550 nm vs 635 nm) cause differences in inhibiting locomotion in vivo as it did for inhibiting the neuronal firing in vitro (Fig. 3B-E).

      The non-linear input-output function within cortical networks, coupled with our sole reliance on behaviour as a readout, will pose challenges in resolving subtle effects on locomotion arrest across various stimulation parameters.

      For our planned in vivo electrophysiology recordings, we will measure cortical firing rates as a proxy rather than relying solely on behavioural observations. This approach will allow us to map the fundamental axes of our parameter space in vivo, considering factors such as wavelength, light intensity, and frequency

      (7) Explanation of why the 20Hz/2s light stimulation protocol was chosen.

      As outlined above, considering animal welfare and increased membrane capacitance in vivo, we opted for the outlined stimulation protocol. This approach allows the animals to spend less time in an arrested state while still demonstrating the effect of MsACR and variants.

      (8) In vivo thresholding against other inhibitory tools, such as RubyACRs, Jaws, etc would provide critical guidance for the audience and potential users. It would be particularly important to compare the necessary light intensities for reaching similar behavioral outcomes.

      We concur with Reviewer 2 and will prepare data using GtACR1 as a reference.

      (9) The author should calculate or reasonably estimate the in vivo light intensity during optogenetic stimulation to provide a meaningful comparison to their in vitro characterization. Ideally, they can provide an estimated volume for efficient stimulation of MsACR1 and raACR1 and compare it to other optogenetic tools.

      We will conduct a Monte Carlo simulation and offer a comparison of the effective activation volume across various classes of optogenetic tools.

      Minor concerns:

      (1) Why were st- MsACR1 and raACR1 used in vitro but not in vivo? The viral constructs were described as AAV/DJ-hSyn1-MsACR-mCerulean and AAV/DJ-hSyn1-raACR-mCerulean.

      As mentioned earlier, we were unable to produce purified soma-targeted MsACR variants before the manuscript submission. We will now provide these measurements.

      (2) Light intensities for the spectral measurements are missing.

      During action spectra measurements, a motorised neutral density filter wheel is used to have equal photon flux for all tested wavelengths. Additionally, the light intensity is further reduced by using additional neutral density filters to ensure sufficiently low photocurrents to determine the spectral maximum. Therefore, the light intensity varied between constructs and sometimes measurements. We added the following line to the respective methods section to further clarify this: “(typically in the low µW-range at 𝜆max)”.

      (3) MsACR1 is slower and probably more light-sensitive than raACR1, which is faster but has larger photocurrents. These are complementary tradeoffs, and the audience might wonder how MsACR1 and raACR1 photocurrents compare under similar conditions. Therefore, I suggest an alternative representation in Fig. 2C. That is, the presentation of the excitation spectra under similar light intensities and with absolute photocurrent values.

      Unfortunately, due to the reasons stated above, MsACR1 and raACR action spectra were not recorded with the same light intensity. However, MsACR1 and raACR are compared under the same conditions for Fig. 2B, E, and F (560 nm light at ~3.2 mW/mm2) as well as in Supp. Fig. 4C.

      (4) Figure legends for figures 3F and G are missing details for describing the stimulation paradigm.

      We added more details about the stimulation paradigm.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents a compelling and comprehensive study of decision-making under uncertainty. It addresses a fundamental distinction between belief-based (cognitive neuroscience) formulations of choice behaviour with reward-based (behavioural psychology) accounts. Specifically, it asks whether active inference provides a better account of planning and decision-making, relative to reinforcement learning. To do this, the authors use a simple but elegant paradigm that includes choices about whether to seek both information and rewards. They then assess the evidence for active inference and reinforcement learning models of choice behaviour, respectively. After demonstrating that active inference provides a better explanation of behavioural responses, the neuronal correlates of epistemic and instrumental value (under an optimised active inference model) are characterised using EEG. Significant neuronal correlates of both kinds of value were found in sensor and source space. The source space correlates are then discussed sensibly, in relation to the existing literature on the functional anatomy of perceptual and instrumental decision-making under uncertainty.

      Strengths:

      The strengths of this work rest upon the theoretical underpinnings and careful deconstruction of the various determinants of choice behaviour using active inference. A particular strength here is that the experimental paradigm is designed carefully to elicit both information-seeking and reward-seeking behaviour; where the information-seeking is itself separated into resolving uncertainty about the context (i.e., latent states) and the contingencies (i.e., latent parameters), under which choices are made. In other words, the paradigm - and its subsequent modelling - addresses both inference and learning as necessary belief and knowledge-updating processes that underwrite decisions.

      The authors were then able to model belief updating using active inference and then look for the neuronal correlates of the implicit planning or policy selection. This speaks to a further strength of this study; it provides some construct validity for the modelling of belief updating and decision-making; in terms of the functional anatomy as revealed by EEG. Empirically, the source space analysis of the neuronal correlates licences some discussion of functional specialisation and integration at various stages in the choices and decision-making.

      In short, the strengths of this work rest upon a (first) principles account of decision-making under uncertainty in terms of belief updating that allows them to model or fit choice behaviour in terms of Bayesian belief updating - and then use relatively state-of-the-art source reconstruction to examine the neuronal correlates of the implicit cognitive processing.

      Response: We are deeply grateful for your careful review of our work and for the thoughtful feedback you have provided. Your dedication to ensuring the quality and clarity of the work is truly admirable. Your comments have been invaluable in guiding us towards improving the paper, and We appreciate your time and effort in not just offering suggestions but also providing specific revisions that I can implement. Your insights have helped us identify areas where I can strengthen the arguments and clarify the methodology.

      Comment 1:

      The main weaknesses of this report lies in the communication of the ideas and procedures. Although the language is generally excellent, there are some grammatical lapses that make the text difficult to read. More importantly, the authors are not consistent in their use of some terms; for example, uncertainty and information gain are sometimes conflated in a way that might confuse readers. Furthermore, the descriptions of the modelling and data analysis are incomplete. These shortcomings could be addressed in the following way.

      First, it would be useful to unpack the various interpretations of information and goal-seeking offered in the (active inference) framework examined in this study. For example, it will be good to include the following paragraph:

      "In contrast to behaviourist approaches to planning and decision-making, active inference formulates the requisite cognitive processing in terms of belief updating in which choices are made based upon their expected free energy. Expected free energy can be regarded as a universal objective function, specifying the relative likelihood of alternative choices. In brief, expected free energy can be regarded as the surprise expected following some action, where the expected surprise comes in two flavours. First, the expected surprise is uncertainty, which means that policies with a low expected free energy resolve uncertainty and promote information seeking. However, one can also minimise expected surprise by avoiding surprising, aversive outcomes. This leads to goal-seeking behaviour, where the goals can be regarded as prior preferences or rewarding outcomes.

      Technically, expected free energy can be expressed in terms of risk plus ambiguity - or rearranged to be expressed in terms of expected information gain plus expected value, where value corresponds to (log) prior preferences. We will refer to both decompositions in what follows; noting that both decompositions accommodate information and goal-seeking imperatives. That is, resolving ambiguity and maximising information gain have epistemic value, while minimising risk or maximising expected value have pragmatic or instrumental value. These two kinds of values are sometimes referred to in terms of intrinsic and extrinsic value, respectively [1-4]."

      Response 1: We deeply thank you for your comments and corresponding suggestions about our interpretations of active inference. In response to your identified weaknesses and suggestions, we have added corresponding paragraphs in the Methods section (The free energy principle and active inference, line 95-106):

      “Active inference formulates the necessary cognitive processing as a process of belief updating, where choices depend on agents' expected free energy. Expected free energy serves as a universal objective function, guiding both perception and action. In brief, expected free energy can be seen as the expected surprise following some policies. The expected surprise can be reduced by resolving uncertainty, and one can select policies with lower expected free energy which can encourage information-seeking and resolve uncertainty. Additionally, one can minimize expected surprise by avoiding surprising or aversive outcomes (oudeyer et al., 2007; Schmidhuber et al., 2010). This leads to goal-seeking behavior, where goals can be viewed as prior preferences or rewarding outcomes.

      Technically, expected free energy can also be expressed as expected information gain plus expected value, where the value corresponds to (log) prior preferences. We will refer to both formulations in what follows. Resolving ambiguity, minimizing risk, and maximizing information gain has epistemic value while maximizing expected value have pragmatic or instrumental value. These two types of values can be referred to in terms of intrinsic and extrinsic value, respectively (Barto et al., 2013; Schwartenbeck et al., 2019).”

      Oudeyer, P. Y., & Kaplan, F. (2007). What is intrinsic motivation? A typology of computational approaches. Frontiers in neurorobotics, 1, 108.

      Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE transactions on autonomous mental development, 2(3), 230-247.

      Barto, A., Mirolli, M., & Baldassarre, G. (2013). Novelty or surprise?. Frontiers in psychology, 4, 61898.

      Schwartenbeck, P., Passecker, J., Hauser, T. U., FitzGerald, T. H., Kronbichler, M., & Friston, K. J. (2019). Computational mechanisms of curiosity and goal-directed exploration. elife, 8, e41703.

      Comment 2:

      The description of the modelling of choice behaviour needs to be unpacked and motivated more carefully. Perhaps along the following lines:

      "To assess the evidence for active inference over reinforcement learning, we fit active inference and reinforcement learning models to the choice behaviour of each subject. Effectively, this involved optimising the free parameters of active inference and reinforcement learning models to maximise the likelihood of empirical choices. The resulting (marginal) likelihood was then used as the evidence for each model. The free parameters for the active inference model scaled the contribution of the three terms that constitute the expected free energy (in Equation 6). These coefficients can be regarded as precisions that characterise each subjects' prior beliefs about contingencies and rewards. For example, increasing the precision or the epistemic value associated with model parameters means the subject would update her beliefs about reward contingencies more quickly than a subject who has precise prior beliefs about reward distributions. Similarly, subjects with a high precision over prior preferences or extrinsic value can be read as having more precise beliefs that she will be rewarded. The free parameters for the reinforcement learning model included..."

      Response 2: We deeply thank you for your comments and corresponding suggestions about our description of the behavioral modelling. In response to your identified weaknesses and suggestions, we have added corresponding content in the Results section (Behavioral results, line 279-293):

      “To assess the evidence for active inference over reinforcement learning, we fit active inference (Eq.9), model-free reinforcement learning, and model-based reinforcement learning models to the behavioral data of each participant. This involved optimizing the free parameters of active inference and reinforcement learning models. The resulting likelihood was used to calculate the Bayesian Information Criterion (BIC) (Vrieze 2012) as the evidence for each model. The free parameters for the active inference model (AL, AI, EX, prior, and α) scaled the contribution of the three terms that constitute the expected free energy in Eq.9. These coefficients can be regarded as precisions that characterize each participant's prior beliefs about contingencies and rewards. For example, increasing α means participants would update their beliefs about reward contingencies more quickly, increasing AL means participants would like to reduce ambiguity more, and increasing AI means participants would like to learn the hidden state of the environment and avoid risk more. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ and the free parameters for the model-based are the learning rate α, the temperature parameter γ and prior (the details for the model-free reinforcement learning model can be seen in Eq.S1-11 and the details for the model-based reinforcement learning model can be seen Eq.S12-23 in the Supplementary Method). The parameter fitting for these three models was conducted using the `BayesianOptimization' package in Python (Frazire 2018), first randomly sampling 1000 times and then iterating for an additional 1000 times.”

      Vrieze, S. I. (2012). Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological methods, 17(2), 228.

      Frazier, P. I. (2018). A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811.

      Comment 3:

      In terms of the time-dependent correlations with expected free energy - and its constituent terms - I think the report would benefit from overviewing these analyses with something like the following:

      "In the final analysis of the neuronal correlates of belief updating - as quantified by the epistemic and intrinsic values of expected free energy - we present a series of analyses in source space. These analyses tested for correlations between constituent terms in expected free energy and neuronal responses in source space. These correlations were over trials (and subjects). Because we were dealing with two-second timeseries, we were able to identify the periods of time during decision-making when the correlates were expressed.

      In these analyses, we focused on the induced power of neuronal activity at each point in time, at each brain source. To illustrate the functional specialisation of these neuronal correlates, we present whole-brain maps of correlation coefficients and pick out the most significant correlation for reporting fluctuations in selected correlations over two-second periods. These analyses are presented in a descriptive fashion to highlight the nature and variety of the neuronal correlates, which we unpack in relation to the existing EEG literature in the discussion. Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations."

      Response 3: We deeply thank you for your comments and corresponding suggestions about our description of the regression analysis in the source space. In response to your suggestions, we have added corresponding content in the Results section (EEG results at source level, line 331-347):

      “In the final analysis of the neural correlates of the decision-making process, as quantified by the epistemic and intrinsic values of expected free energy, we presented a series of linear regressions in source space. These analyses tested for correlations over trials between constituent terms in expected free energy (the value of avoiding risk, the value of reducing ambiguity, extrinsic value, and expected free energy itself) and neural responses in source space. Additionally, we also investigated the neural correlate of (the degree of) risk, (the degree of) ambiguity, and prediction error. Because we were dealing with a two-second time series, we were able to identify the periods of time during decision-making when the correlates were expressed. The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ~ Regressor + Intercept). Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned (e.g., expected free energy, the value of reducing ambiguity, etc.).

      In these analyses, we focused on the induced power of neural activity at each time point, in the brain source space. To illustrate the functional specialization of these neural correlates, we presented whole-brain maps of correlation coefficients and picked out the brain region with the most significant correlation for reporting fluctuations in selected correlations over two-second periods. These analyses were presented in a descriptive fashion to highlight the nature and variety of the neural correlates, which we unpacked in relation to the existing EEG literature in the discussion. Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations.”

      Comment 4:

      There was a slight misdirection in the discussion of priors in the active inference framework. The notion that active inference requires a pre-specification of priors is a common misconception. Furthermore, it misses the point that the utility of Bayesian modelling is to identify the priors that each subject brings to the table. This could be easily addressed with something like the following in the discussion:

      "It is a common misconception that Bayesian approaches to choice behaviour (including active inference) are limited by a particular choice of priors. As illustrated in our fitting of choice behaviour above, priors are a strength of Bayesian approaches in the following sense: under the complete class theorem [5, 6], any pair of choice behaviours and reward functions can be described in terms of ideal Bayesian decision-making with particular priors. In other words, there always exists a description of choice behaviour in terms of some priors. This means that one can, in principle, characterise any given behaviour in terms of the priors that explain that behaviour. In our example, these were effectively priors over the precision of various preferences or beliefs about contingencies that underwrite expected free energy."

      Response 4: We deeply thank you for your comments and corresponding suggestions about the prior of Bayesian methods. In response to your suggestions, we have added corresponding content in the Discussion section (The strength of the active inference framework in decision-making, line 447-453):

      “However, it may be the opposite. As illustrated in our fitting results, priors can be a strength of Bayesian approaches. Under the complete class theorem (Wald 1947; Brown 1981), any pair of behavioral data and reward functions can be described in terms of ideal Bayesian decision-making with particular priors. In other words, there always exists a description of behavioral data in terms of some priors. This means that one can, in principle, characterize any given behavioral data in terms of the priors that explain that behavior. In our example, these were effectively priors over the precision of various preferences or beliefs about contingencies that underwrite expected free energy.”

      Wald, A. (1947). An essentially complete class of admissible decision functions. The Annals of Mathematical Statistics, 549-555.

      Brown, L. D. (1981). A complete class theorem for statistical problems with finite sample spaces. The Annals of Statistics, 1289-1300.

      Reviewer #2 (Public Review):

      Summary:

      Zhang and colleagues use a combination of behavioral, neural, and computational analyses to test an active inference model of exploration in a novel reinforcement learning task.

      Strengths:

      The paper addresses an important question (validation of active inference models of exploration). The combination of behavior, neuroimaging, and modeling is potentially powerful for answering this question.

      Response: We want to express our sincere gratitude for your thorough review of our work and for the valuable comments you have provided. Your attention to detail and dedication to improving the quality of the work are truly commendable. Your feedback has been invaluable in guiding us towards revisions that will strengthen the work. We have made targeted modifications based on most of the comments. However, due to factors such as time and energy constraints, we have not added corresponding analyses for several comments.

      Comment 1:

      The paper does not discuss relevant work on contextual bandits by Schulz, Collins, and others. It also does not mention the neuroimaging study of Tomov et al. (2020) using a risky/safe bandit task.

      Response 1:

      We deeply thank you for your suggestions about the relevant work. We now discussion and cite these representative papers in the Introduction section (line 42-55):

      “The decision-making process frequently involves grappling with varying forms of uncertainty, such as ambiguity - the kind of uncertainty that can be reduced through sampling, and risk - the inherent uncertainty (variance) presented by a stable environment. Studies have investigated these different forms of uncertainty in decision-making, focusing on their neural correlates (Daw et al., 2006; Badre et al., 2012; Cavanagh et al., 2012).

      These studies utilized different forms of multi-armed bandit tasks, e.g the restless multi-armed bandit tasks (Daw et al., 2006; Guha et al., 2010), risky/safe bandit tasks (Tomov et al., 2020; Fan et al., 2022; Payzan et al., 2013), contextual multi-armed bandit tasks (Schulz et al., 2015; Schulz et al., 2015; Molinaro et al., 2023). However, these tasks either separate risk from ambiguity in uncertainty, or separate action from state (perception). In our work, we develop a contextual multi-armed bandit task to enable participants to actively reduce ambiguity, avoid risk, and maximize rewards using various policies (see Section 2.2) and Figure 4(a)). Our task makes it possible to study whether the brain represents these different types of uncertainty distinctly (Levy et al., 2010) and whether the brain represents both the value of reducing uncertainty and the degree of uncertainty. The active inference framework presents a theoretical approach to investigate these questions. Within this framework, uncertainties can be reduced to ambiguity and risk. Ambiguity is represented by the uncertainty about model parameters associated with choosing a particular action, while risk is signified by the variance of the environment's hidden states. The value of reducing ambiguity, the value of avoiding risk, and extrinsic value together constitute expected free energy (see Section 2.1).”

      Daw, N. D., O'doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876-879.

      Badre, D., Doll, B. B., Long, N. M., & Frank, M. J. (2012). Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron, 73(3), 595-607.

      Cavanagh, J. F., Figueroa, C. M., Cohen, M. X., & Frank, M. J. (2012). Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. Cerebral cortex, 22(11), 2575-2586.

      Guha, S., Munagala, K., & Shi, P. (2010). Approximation algorithms for restless bandit problems. Journal of the ACM (JACM), 58(1), 1-50.

      Tomov, M. S., Truong, V. Q., Hundia, R. A., & Gershman, S. J. (2020). Dissociable neural correlates of uncertainty underlie different exploration strategies. Nature communications, 11(1), 2371.

      Fan, H., Gershman, S. J., & Phelps, E. A. (2023). Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nature Human Behaviour, 7(1), 102-113.

      Payzan-LeNestour, E., Dunne, S., Bossaerts, P., & O’Doherty, J. P. (2013). The neural representation of unexpected uncertainty during value-based decision making. Neuron, 79(1), 191-201.

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, April). Exploration-exploitation in a contextual multi-armed bandit task. In International conference on cognitive modeling (pp. 118-123).

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, November). Learning and decisions in contextual multi-armed bandit tasks. In CogSci.

      Molinaro, G., & Collins, A. G. (2023). Intrinsic rewards explain context-sensitive valuation in reinforcement learning. PLoS Biology, 21(7), e3002201.

      Levy, I., Snell, J., Nelson, A. J., Rustichini, A., & Glimcher, P. W. (2010). Neural representation of subjective value under risk and ambiguity. Journal of neurophysiology, 103(2), 1036-1047.

      Comment 2:

      The statistical reporting is inadequate. In most cases, only p-values are reported, not the relevant statistics, degrees of freedom, etc. It was also not clear if any corrections for multiple comparisons were applied. Many of the EEG results are described as "strong" or "robust" with significance levels of p<0.05; I am skeptical in the absence of more details, particularly given the fact that the corresponding plots do not seem particularly strong to me.

      Response 2: We deeply thank you for your comments about our statistical reporting. We have optimized the fitting model and rerun all the statistical analyses. As can be seen (Figure 6, 7, 8, S3, S4, S5), the new regression results are significantly improved compared to the previous ones. Due to the limitation of space, we place the other relevant statistical results, including t-values, std err, etc., on our GitHub (https://github.com/andlab-um/FreeEnergyEEG). Currently, we have not conducted multiple comparison corrections based on Reviewer 1’s comments (Comments 3) “Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations”.

      Author response image 1.

      Comment 3:

      The authors compare their active inference model to a "model-free RL" model. This model is not described anywhere, as far as I can tell. Thus, I have no idea how it was fit, how many parameters it has, etc. The active inference model fitting is also not described anywhere. Moreover, you cannot compare models based on log-likelihood, unless you are talking about held-out data. You need to penalize for model complexity. Finally, even if active inference outperforms a model-free RL model (doubtful given the error bars in Fig. 4c), I don't see how this is strong evidence for active inference per se. I would want to see a much more extensive model comparison, including model-based RL algorithms which are not based on active inference, as well as model recovery analyses confirming that the models can actually be distinguished on the basis of the experimental data.

      Response 3: We deeply thank you for your comments about the model comparison details. We previously omitted some information about the comparison model, as classical reinforcement learning is not the focus of our work, so we put the specific details in the supplementary materials. Now we have placed relevant information in the main text (see the part we have highlighted in yellow). We have now added the relevant information regarding the model comparison in the Results section (Behavioral results, line 279-293):

      “To assess the evidence for active inference over reinforcement learning, we fit active inference (Eq.9), model-free reinforcement learning, and model-based reinforcement learning models to the behavioral data of each participant. This involved optimizing the free parameters of active inference and reinforcement learning models. The resulting likelihood was used to calculate the Bayesian Information Criterion (BIC) as the evidence for each model. The free parameters for the active inference model (AL, AI, EX, prior, and α) scaled the contribution of the three terms that constitute the expected free energy in Eq.9. These coefficients can be regarded as precisions that characterize each participant's prior beliefs about contingencies and rewards. For example, increasing α means participants would update their beliefs about reward contingencies more quickly, increasing AL means participants would like to reduce ambiguity more, and increasing AI means participants would like to learn the hidden state of the environment and avoid risk more. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ and the free parameters for the model-based are the learning rate α, the temperature parameter γ and prior (the details for the model-free reinforcement learning model can be found in Eq.S1-11 and the details for the model-based reinforcement learning model can be found in Eq.S12-23 in the Supplementary Method). The parameter fitting for these three models was conducted using the `BayesianOptimization' package in Python, first randomly sampling 1000 times and then iterating for an additional 1000 times.”

      We have now incorporated model-based reinforcement learning into our comparison models and placed the descriptions of both model-free and model-based reinforcement learning algorithms in the supplementary materials. We have also changed the criterion for model comparison to Bayesian Information Criterion. As indicated by the results, the performance of the active inference model significantly outperforms both comparison models.

      Sorry, we didn't do model recovery before, but now we have placed the relevant results in the supplementary materials. From the result figures, we can see that each model fits its own generated simulated data well:

      “To demonstrate how reliable our models are (the active inference model, model-free reinforcement learning model, and model-based reinforcement learning model), we run some simulation experiments for model recovery. We use these three models, with their own fitting parameters, to generate some simulated data. Then we will fit all three sets of data using these three models.

      The model recovery results are shown in Fig.S6. This is the confusion matrix of models: the percentage of all subjects simulated based on a certain model that is fitted best by a certain model. The goodness-of-fit was compared using the Bayesian Information Criterion. We can see that the result of model recovery is very good, and the simulated data generated by a model can be best explained by this model.”

      Author response image 2.

      Comment 4:

      Another aspect of the behavioral modeling that's missing is a direct descriptive comparison between model and human behavior, beyond just plotting log-likelihoods (which are a very impoverished measure of what's going on).

      Response 4: We deeply thank you for your comments about the comparison between the model and human behavior. Due to the slight differences between our simulation experiments and real behavioral experiments (the "you can ask" stage), we cannot directly compare the model and participants' behaviors. However, we can observe that in the main text's simulation experiment (Figure 3), the active inference agent's behavior is highly consistent with humans (Figure 4), exhibiting an effective exploration strategy and a desire to reduce uncertainty. Moreover, we have included two additional simulation experiments in the supplementary materials, which demonstrate that active inference may potentially fit a wide range of participants' behavioral strategies.

      Author response image 3.

      (An active inference agent with AL=AI=EX=0. It can accomplish tasks efficiently like a human being, reducing the uncertainty of the environment and maximizing the reward.)

      Author response image 4.

      (An active inference agent with AL=AI=0, EX=10. It will only pursue immediate rewards (not choosing the "Cue" option due to additional costs), but it can also gradually optimize its strategy due to random effects.)

      Author response image 5.

      (An active inference agent with EX=0, AI=AL=10. It will only pursue environmental information to reduce the uncertainty of the environment. Even in "Context 2" where immediate rewards are scarce, it will continue to explore.)

      Figure (a) shows the decision-making of active inference agents in the Stay-Cue choice. Blue corresponds to agents choosing the "Cue" option and acquiring "Context 1"; orange corresponds to agents choosing the "Cue" option and acquiring "Context 2"; purple corresponds to agents choosing the "Stay" option and not knowing the information about the hidden state of the environment. The shaded areas below correspond to the probability of the agents making the respective choices.

      Figure (b) shows the decision-making of active inference agents in the Stay-Cue choice. The shaded areas below correspond to the probability of the agents making the respective choices.

      Figure (c) shows the rewards obtained by active inference agents.

      Figure (d) shows the reward prediction errors of active inference agents.

      Figure (e) shows the reward predictions of active inference agents for the "Risky" path in "Context 1" and "Context 2".

      Comment 5:

      The EEG results are intriguing, but it wasn't clear that these provide strong evidence specifically for the active inference model. No alternative models of the EEG data are evaluated.

      Overall, the central claim in the Discussion ("we demonstrated that the active inference model framework effectively describes real-world decision-making") remains unvalidated in my opinion.

      Response 5: We deeply thank you for your comments. We applied the active inference model to analyze EEG results because it best fit the participants' behavioral data among our models, including the new added results. Further, our EEG results serve only to verify that the active inference model can be used to analyze the neural mechanisms of decision-making in uncertain environments (if possible, we could certainly design a more excellent reinforcement learning model with a similar exploration strategy). We aim to emphasize the consistency between active inference and human decision-making in uncertain environments, as we have discussed in the article. Active inference emphasizes both perception and action, which is also what we wish to highlight: during the decision-making process, participants not only passively receive information, but also actively adopt different strategies to reduce uncertainty and maximize rewards.

      Reviewer #3 (Public Review):

      Summary:

      This paper aims to investigate how the human brain represents different forms of value and uncertainty that participate in active inference within a free-energy framework, in a two-stage decision task involving contextual information sampling, and choices between safe and risky rewards, which promotes a shift from exploration to exploitation. They examine neural correlates by recording EEG and comparing activity in the first vs second half of trials and between trials in which subjects did and did not sample contextual information, and perform a regression with free-energy-related regressors against data "mapped to source space." Their results show effects in various regions, which they take to indicate that the brain does perform this task through the theorised active inference scheme.

      Strengths:

      This is an interesting two-stage paradigm that incorporates several interesting processes of learning, exploration/exploitation, and information sampling. Although scalp/brain regions showing sensitivity to the active-inference-related quantities do not necessarily suggest what role they play, it can be illuminating and useful to search for such effects as candidates for further investigation. The aims are ambitious, and methodologically it is impressive to include extensive free-energy theory, behavioural modelling, and EEG source-level analysis in one paper.

      Response: We would like to express our heartfelt thanks to you for carefully reviewing our work and offering insightful feedback. Your attention to detail and commitment to enhancing the overall quality of our work are deeply admirable. Your input has been extremely helpful in guiding us through the necessary revisions to enhance the work. We have implemented focused changes based on a majority of your comments. Nevertheless, owing to limitations such as time and resources, we have not included corresponding analyses for a few comments.

      Comment 1:

      Though I could surmise the above general aims, I could not follow the important details of what quantities were being distinguished and sought in the EEG and why. Some of this is down to theoretical complexity - the dizzying array of constructs and terms with complex interrelationships, which may simply be part and parcel of free-energy-based theories of active inference - but much of it is down to missing or ambiguous details.

      Response 1: We deeply thank you for your comments about our work’s readability. We have significantly revised the descriptions of active inference, models, research questions, etc. Focusing on active inference and the free energy principle, we have added relevant basic descriptions and unified the terminology. We have added information related to model comparison in the main text and supplementary materials. We presented our regression results in clearer language. Our research focused on the brain's representation of decision-making in uncertain environments, including expected free energy, the value of reducing ambiguity, the value of avoiding risk, extrinsic value, ambiguity, and risk.

      Comment 2:

      In general, an insufficient effort has been made to make the paper accessible to readers not steeped in the free energy principle and active inference. There are critical inconsistencies in key terminology; for example, the introduction states that aim 1 is to distinguish the EEG correlates of three different types of uncertainty: ambiguity, risk, and unexpected uncertainty. But the abstract instead highlights distinctions in EEG correlates between "uncertainty... and... risk" and between "expected free energy .. and ... uncertainty." There are also inconsistencies in mathematical labelling (e.g. in one place 'p(s|o)' and 'q(s)' swap their meanings from one sentence to the very next).

      Response 2: We deeply thank you for your comments about the problem of inconsistent terminology. First, we have unified the symbols and letters (P, Q, s, o, etc.) that appeared in the article and described their respective meanings more clearly. We have also revised the relevant expressions of "uncertainty" throughout the text. In our work, uncertainty refers to ambiguity and risk. Ambiguity can be reduced through continuous sampling and is referred to as uncertainty about model parameters in our work. Risk, on the other hand, is the inherent variance of the environment and cannot be reduced through sampling, which is referred to as uncertainty about hidden states in our work. In the analysis of the results, we focused on how the brain encodes the value of reducing ambiguity (Figure 8), the value of avoiding risk (Figure 6), and (the degree of) ambiguity (Figure S5) during action selection. We also analyzed how the brain encodes reducing ambiguity and avoiding risk during belief update (Figure 7).

      Comment 3:

      Some basic but important task information is missing, and makes a huge difference to how decision quantities can be decoded from EEG. For example:

      - How do the subjects press the left/right buttons - with different hands or different fingers on the same hand?

      Response 3: We deeply thank you for your comments about the missing task information. We have added the relevant content in the Methods section (Contextual two-armed bandit task and Data collection, line 251-253):

      “Each stage was separated by a jitter ranging from 0.6 to 1.0 seconds. The entire experiment consists of a single block with a total of 120 trials. The participants are required to use any two fingers of one hand to press the buttons (left arrow and right arrow on the keyboard).”

      Comment 4:

      - Was the presentation of the Stay/cue and safe/risky options on the left/right sides counterbalanced? If not, decisions can be formed well in advance especially once a policy is in place.

      Response 4: The presentation of the Stay/cue and safe/risky options on the left/right sides was not counterbalanced. It is true that participants may have made decisions ahead of time. However, to better study the state of participants during decision-making, our choice stages consist of two parts. In the first two seconds, we ask participants to consider which option they would choose, and after these two seconds, participants are allowed to make their choice (by pressing the button).

      We also updated the figure of the experiment procedure as below (We circled the time that the participants spent on making decisions).

      Author response image 6.

      Comment 5:

      - What were the actual reward distributions ("magnitude X with probability p, magnitude y with probability 1-p") in the risky option?

      Response 5: We deeply thank you for your comments about the missing task information. We have placed the relevant content in the Methods section (Contextual two-armed bandit task and Data collection, line 188-191):

      “The actual reward distribution of the risky path in "Context 1" was [+12 (55%), +9 (25%), +6 (10%), +3 (5%), +0 (5%)] and the actual reward distribution of the risky path in "Context 2" was [+12 (5%), +9 (5%), +6 (10%), +3 (25%), +0 (55%)].”

      Comment 6:

      The EEG analysis is not sufficiently detailed and motivated.

      For example,

      - why the high lower-filter cutoff of 1 Hz, and shouldn't it be acknowledged that this removes from the EEG any sustained, iteratively updated representation that evolves with learning across trials?

      Response 6: We deeply thank you for your comments about our EEG analysis. The 1Hz high-pass filter may indeed filter out some useful information. We chose a 1Hz high-pass filter to filter out most of the noise and prevent the noise from affecting our results analysis. Additionally, there are also many decision-related works that have applied 1Hz high-pass filtering in EEG data preprocessing (Yau et al., 2021; Cortes et al., 2021; Wischnewski et al., 2022; Schutte et al., 2017; Mennella et al., 2020; Giustiniani et al., 2020).

      Yau, Y., Hinault, T., Taylor, M., Cisek, P., Fellows, L. K., & Dagher, A. (2021). Evidence and urgency related EEG signals during dynamic decision-making in humans. Journal of Neuroscience, 41(26), 5711-5722.

      Cortes, P. M., García-Hernández, J. P., Iribe-Burgos, F. A., Hernández-González, M., Sotelo-Tapia, C., & Guevara, M. A. (2021). Temporal division of the decision-making process: An EEG study. Brain Research, 1769, 147592.

      Wischnewski, M., & Compen, B. (2022). Effects of theta transcranial alternating current stimulation (tACS) on exploration and exploitation during uncertain decision-making. Behavioural Brain Research, 426, 113840.

      Schutte, I., Kenemans, J. L., & Schutter, D. J. (2017). Resting-state theta/beta EEG ratio is associated with reward-and punishment-related reversal learning. Cognitive, Affective, & Behavioral Neuroscience, 17, 754-763.

      Mennella, R., Vilarem, E., & Grèzes, J. (2020). Rapid approach-avoidance responses to emotional displays reflect value-based decisions: Neural evidence from an EEG study. NeuroImage, 222, 117253.

      Giustiniani, J., Nicolier, M., Teti Mayer, J., Chabin, T., Masse, C., Galmès, N., ... & Gabriel, D. (2020). Behavioral and neural arguments of motivational influence on decision making during uncertainty. Frontiers in Neuroscience, 14, 583.

      Comment 7:

      - Since the EEG analysis was done using an array of free-energy-related variables in a regression, was multicollinearity checked between these variables?

      Response 7: We deeply thank you for your comments about our regression. Indeed, we didn't specify our regression formula in the main text. We conducted regression on one variable each time, so there was no need for a multicollinearity check. We have now added the relevant content in the Results section (“EEG results at source level” section, line 337-340):

      “The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ~ Regressor + Intercept). Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned (e.g., expected free energy, the value of reducing ambiguity, etc.).”

      Comment 8:

      - In the initial comparison of the first/second half, why just 5 clusters of electrodes, and why these particular clusters?

      Response 8: We deeply thank you for your comments about our sensor-level analysis. These five clusters are relatively common scalp EEG regions to analyze (left frontal, right frontal, central, left parietal, and right parietal), and we referred previous work analyzed these five clusters of electrodes (Laufs et al., 2006; Ray et al., 1985; Cole et al., 1985). In addition, our work pays more attention to the analysis in source space, exploring the corresponding functions of specific brain regions based on active inference models.

      Laufs, H., Holt, J. L., Elfont, R., Krams, M., Paul, J. S., Krakow, K., & Kleinschmidt, A. (2006). Where the BOLD signal goes when alpha EEG leaves. Neuroimage, 31(4), 1408-1418.

      Ray, W. J., & Cole, H. W. (1985). EEG activity during cognitive processing: influence of attentional factors. International Journal of Psychophysiology, 3(1), 43-48.

      Cole, H. W., & Ray, W. J. (1985). EEG correlates of emotional tasks related to attentional demands. International Journal of Psychophysiology, 3(1), 33-41.

      Comment 9:

      How many different variables are systematically different in the first vs second half, and how do you rule out less interesting time-on-task effects such as engagement or alertness? In what time windows are these amplitudes being measured?

      Response 9 (and the Response for Weaknesses 11): There were no systematic differences between the first half and the second half of the trials, with the only difference being the participants' experience. In the second half, participants had a better understanding of the reward distribution of the task (less ambiguity). The simulation results can well describe these.

      Author response image 7.

      As shown in Figure (a), agents can only learn about the hidden state of the environment ("Context 1" (green) or "Context 2" (orange)) by choosing the "Cue" option. If agents choose the "Stay" option, they will not be able to know the hidden state of the environment (purple). The risk of agents is only related to wh

      ether they choose the "Cue" option, not the number of rounds. Figure (b) shows the Safe-Risky choices of agents, and Figure (e) is the reward prediction of agents for the "Risky" path in "Context 1" and "Context 2". We can see that agents update the expected reward and reduce ambiguity by sampling the "Risky" path. The ambiguity of agents is not related to the "Cue" option, but to the number of times they sample the "Risky" path (rounds).

      In our choosing stages, participants were required to think about their choices for the first two seconds (during which they could not press buttons). Then, they were asked to make their choices (press buttons) within the next two seconds. This setup effectively kept participants' attention focused on the task. And the two second during the “Second choice” stage when participants decide which option to choose (they cannot press buttons) are measured for the analysis of the sensor-level results.

      Comment 10:

      In the comparison of asked and not-asked trials, what trial stage and time window is being measured?

      Response 10: We have added relevant descriptions in the main text. The two second during the “Second choice” stage when participants decide which option to choose (they cannot press buttons) are measured for the analysis of the sensor-level results.

      Author response image 8.

      Comment 11:

      Again, how many different variables, of the many estimated per trial in the active inference model, are different in the asked and not-asked trials, and how can you know which of these differences is the one reflected in the EEG effects?

      Response 11: The difference between asked trials and not-asked trials lies only in whether participants know the specific context of the risky path (the level of risk for the participants). A simple comparison indeed cannot tell us which of these differences is reflected in the EEG effects. Therefore, we subsequently conducted model-based regression analysis in the source space.

      Comment 12:

      The authors choose to interpret that on not-asked trials the subjects are more uncertain because the cue doesn't give them the context, but you could equally argue that they don't ask because they are more certain of the possible hidden states.

      Response 12: Our task design involves randomly varying the context of the risky path. Only by choosing to inquire can participants learn about the context. Participants can only become increasingly certain about the reward distribution of different contexts of the risky path, but cannot determine which specific context it is. Here are the instructions for the task that we will tell the participants (line 226-231).

      "You are on a quest for apples in a forest, beginning with 5 apples. You encounter two paths: 1) The left path offers a fixed yield of 6 apples per excursion. 2) The right path offers a probabilistic reward of 0/3/6/9/12 apples, and it has two distinct contexts, labeled "Context 1" and "Context 2," each with a different reward distribution. Note that the context associated with the right path will randomly change in each trial. Before selecting a path, a ranger will provide information about the context of the right path ("Context 1" or "Context 2") in exchange for an apple. The more apples you collect, the greater your monetary reward will be."

      Comment 13:

      - The EEG regressors are not fully explained. For example, an "active learning" regressor is listed as one of the 4 at the beginning of section 3.3, but it is the first mention of this term in the paper and the term does not arise once in the methods.

      Response 13: We have accordingly revised the relevant content in the main text (as in Eq.8). Our regressors now include expected free energy, the value of reducing ambiguity, the value of avoiding risk, extrinsic value, prediction error, (the degree of) ambiguity, reducing ambiguity, and avoiding risk.

      Comment 14:

      - In general, it is not clear how one can know that the EEG results reflect that the brain is purposefully encoding these very parameters while implementing this very mechanism, and not other, possibly simpler, factors that correlate with them since there is no engagement with such potential confounds or alternative models. For example, a model-free reinforcement learning model is fit to behaviour for comparison. Why not the EEG?

      Response 14: We deeply thank you for your comments. Due to factors such as time and effort, and because the active inference model best fits the behavioral data of the participants, we did not use other models to analyze the EEG data. At both the sensor and source level, we observed the EEG signal and brain regions that can encode different levels of uncertainties (risk and ambiguity). The brain's uncertainty driven exploration mechanism cannot be explained solely by a simple model-free reinforcement learning approach.

      Recommendations for the authors:

      Response: We have made point-to-point revisions according to the reviewer's recommendations, and as these revisions are relatively minor, we have only responded to the longer recommendations here.

      Reviewer #1 (Recommendations For The Authors)

      I enjoyed reading this sophisticated study of decision-making. I thought your implementation of active inference and the subsequent fitting to choice behaviour - and study of the neuronal (EEG) correlates - was impressive. As noted in my comments on strengths and weaknesses, some parts of your manuscript with difficult to read because of slight collapses in grammar and an inconsistent use of terms when referring to the mathematical quantities. In addition to the paragraphs I have suggested, I would recommend the following minor revisions to your text. In addition, you will have to fill in some of the details that were missing from the current version of the manuscript. For example:

      Recommendation 1:

      Which RL model did you use to fit the behavioural data? What were its free parameters?

      Response 1: We have now added information related to the comparison models in the behavioral results and supplementary materials. We applied both simple model-free reinforcement learning and model-based reinforcement learning. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ, while the free parameters for the model-based approach are the learning rate α, the temperature parameter γ, and the prior.

      Recommendation 2:

      When you talk about neuronal activity in the final analyses (of time-dependent correlations) what was used to measure the neuronal activity? Was this global power over frequencies? Was it at a particular frequency band? Was it the maximum amplitude within some small window et cetera? In other words, you need to provide the details of your analysis that would enable somebody to reproduce your study at a certain level of detail.

      Response 2: In the final analyses, we used the activity amplitude at each point in the source space for our analysis. Previously, we had planned to make our data and models available on GitHub to facilitate easier replication of our work.

      Reviewer #3 (Recommendations For The Authors)

      Recommendation 1:

      It might help to explain the complex concepts up front, to use the concrete example of the task itself - presumably, it was designed so that the crucial elements of the active inference framework come to the fore. One could use hypothetical choice patterns in this task to exemplify different factors such as expected free energy and unexpected uncertainty at work. It would also be illuminating to explain why behaviour on this task is fit better by the active inference model than a model-free reinforcement learning model.

      Response 1: Thank you for your suggestions. We have given clearer explanations to the three terms in the active inference formula: the value of reducing ambiguity, the value of avoiding risk, and the extrinsic value (Eq.8), which makes it easier for readers to understand active inference.

      In addition, we can simply view active inference as a computational model similar to model-based reinforcement learning, where the expected free energy represents a subjective value, without needing to understand its underlying computational principles or neurobiological background. In our discussion, we have argued why the active inference model fits the participants' behavior better than our reinforcement learning model, as the active inference model has an inherent exploration mechanism that is consistent with humans, who instinctively want to reduce environmental uncertainty (line 435-442).

      “Active inference offers a superior exploration mechanism compared with basic model-free reinforcement learning  (Figure 4 (c)). Since traditional reinforcement learning models determine their policies solely on the state, this setting leads to difficulty in extracting temporal information (Laskin et al., 2020) and increases the likelihood of entrapment within local minima. In contrast, the policies in active inference are determined by both time and state. This dependence on time (Wang et al., 2016) enables policies to adapt efficiently, such as emphasizing exploration in the initial stages and exploitation later on. Moreover, this mechanism prompts more exploratory behavior in instances of state ambiguity. A further advantage of active inference lies in its adaptability to different task environments (Friston et al., 2017). It can configure different generative models to address distinct tasks, and compute varied forms of free energy and expected free energy.”

      Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., & Srinivas, A. (2020). Reinforcement learning with augmented data. Advances in neural information processing systems, 33, 19884-19895.

      Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., ... & Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763.

      Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: a process theory. Neural computation, 29(1), 1-49.

      Recommendation 2:

      Figure 1A provides a key example of the lack of effort to help the reader understand. It suggests the possibility of a concrete example but falls short of providing one. From the caption and text, applied to the figure, I gather that by choosing either to run or to raise one's arms, one can control whether it is daytime or nighttime. This is clearly wrong but it is what I am led to think by the paper.

      Response 2: Thank you for your suggestion, which we had not considered before. In this figure, we aim to illustrate that "the agent receives observations and optimizes his cognitive model by minimizing variational free energy → the agent makes the optimal action by minimizing expected free energy → the action changes the environment → the environment generates new observations for the agent." We have now modified the image to be simpler to prevent any possible confusion for readers. Correspondingly, we removed the figure of a person raising their hand and the shadowed house in Figure a.

      Author response image 9.

      Recommendation 3:

      I recommend an overhaul in the labelling and methodological explanations for consistency and full reporting. For example, line 73 says sensory input is 's' and the cognitive model is 'q(s),' and the cause of the sensory input is 'p(s|o)' but on the very next line, the cognitive model is 'p(s|o)' and the causes of sensory input are 'q(s).' How this sensory input s relates to 'observations' or 'o' is unclear, and meanwhile, capital S is the set of environmental states. P seems to refer to the generative distribution, but it also means probability.

      Response 3: Thank you for your advice. Now we have revised the corresponding labeling and methodological explanations in our work to make them consistent. However, we are not sure how to make a good modification to P here. In many works, P can refer to a certain probability distribution or some specific probabilities.

      Recommendation 4:

      Even the conception of a "policy" is unclear (Figure 2B). They list 4 possible policies, which are simply the 4 possible sequences of steps, stay-safe, cue-risky, etc, but with no contingencies in them. Surely a complete policy that lists 'cue' as the first step would entail a specification of how they would choose the safe or risky option BASED on the information in that cue

      Response 4: Thank you for your suggestion. In active inference, a policy actually corresponds to a sequence of actions. The policy of "first choosing 'Cue' and then making the next decision based on specific information" differs from the meaning of policy in active inference.

      Recommendation 5:

      I assume that the heavy high pass filtering of the EEG (1 Hz) is to avoid having to baseline-correct the epochs (of which there is no mention), but the authors should directly acknowledge that this eradicates any component of decision formation that may evolve in any way gradually within or across the stages of the trial. To take an extreme example, as Figure 3E shows, the expected rewards for the risky path evolve slowly over the course of 60 trials. The filter would eliminate this.

      Response 5: Thank you for your suggestion. The heavy high pass filtering of the EEG (1 Hz) is to minimize the noise in the EEG data as much as possible.

      Recommendation 6:

      There is no mention of the regression itself in the Methods section - the section is incomplete.

      Response 6: Thank you for your suggestion. We have now added the relevant content in the Results section (EEG results at source level, line 337-340):

      “The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ∼ Regressor + Intercept, Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned).”

      Recommendation 7:

      On Lines 260-270 the same results are given twice.

      Response 7: Thank you for your suggestion. We have now deleted redundant content.

      Recommendation 8:

      Frequency bands are displayed in Figure 5 but there is no mention of those in the Methods. In Figure 5b Theta in the 2nd half is compared to Delta in the 1st half- is this an error?

      Response 8: Thank you for your suggestion. It indeed was an error (they should all be Theta) and now we have corrected it.

      Author response image 10.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      This work sets out to elucidate mechanistic intricacies in inflammatory responses in pneumonia in the context of the aging process (Terc deficiency - telomerase functionality).

      Strengths:

      Very interesting, conceptually speaking, approach that is by all means worth pursuing. An overall proper approach to the posited aim.

      We want to thank the reviewer for taking the time to review our manuscript and for providing positive feedback regarding our research question.

      Weaknesses:

      The work is heavily underpowered and may have statistical deficits. This precludes it in its current state from drawing unequivocal conclusions.

      Thank you for this essential and valuable comment. We fully accept that the small sample size of the Tercko/ko mice is a major limitation of our study and transparently discuss this in our manuscript.

      However, due to Animal Welfare regulations, only a reduced number of mice were approved because of the strong burden of disease. Consequently, only three non-infected and five infected mice were available to us. This reduced number of mice presents a clear limitation to our study. However, due to ethical considerations related to animal welfare and sustainability, as well as compliance with German animal welfare regulations, it is not possible to obtain additional Tercko/ko mice to increase the dataset. The animal studies are an important aspect of our study; however, our hypothesis was also investigated at multiple levels, including in an in vitro co-culture model (Figure 5), to ensure comprehensive analysis.

      Thus, we clearly demonstrated that S. aureus pneumonia in Tercko/ko mice leads to a more severe phenotype, orchestrated by the dysregulation of both innate and adaptive immune response.

      Reviewer #2 (Public Review):

      Summary:

      The authors demonstrate heightened susceptibility of Terc-KO mice to S. aureus-induced pneumonia, perform gene expression analysis from the infected lungs, find an elevated inflammatory (NLRP3) signature in some Terc-KO but not control mice, and some reduction in T cell signatures. Based on that, They conclude that disregulated inflammation and T-cell dysfunction play a major role in these phenomena.

      Strengths:

      The strengths of the work include a problem not previously addressed (the role of the Terc component of the telomerase complex) in certain aspects of resistance to bacterial infection and innate (and maybe adaptive) immune function.

      We would like to thank the reviewer for the positive feedback regarding our aim to investigate the impact of Terc deletion on the pulmonary immune response to S. aureus.

      Weaknesses:

      The weaknesses outweigh the strengths, dominantly because conclusions are plagued by flaws in experimental design, by lack of rigorous controls, and by incomplete and inadequate approaches to testing immune function. These weaknesses are as follows

      (1)  Terc-KO mice are a genomic knockout model, and therefore the authors need to carefully consider the impact of this KO on a wide range of tissues. This, however, is not the case. There are no attempts to perform cell transfers or use irradiation chimera or crosses that would be informative.

      We thank the reviewer for bringing up this important point. The aim of our study, however; was to investigate the impact of Terc deletion in the lung and on the response to bacterial pneumonia, rather than to provide a comprehensive characterization of the Tercko/ko model itself. This characterization of different tissues and cell types has already been conducted by previous studies. For instance, studies that characterize the general phenotype of the model (Herrera et al., 1999; Lee et al., 1998; Rudolph et al., 1999) but also investigations that shed light on the impact of Terc deletion on specific cell types such as microglia (Khan et al., 2015) or T cells (Matthe et al., 2022). The impact of Terc deletion on T cells is also discussed in our manuscript in lines 89 to 105. Furthermore, a section about the general phenotype of the Terc deletion model is included in the introduction in lines 126 to 138. Thus we discussed the relevant literature regarding Tercko/ko mice in our manuscript and attempted to provide a more in-depth characterization of the lung by investigating the inflammatory response to infection as well as changes in the gene expression (Figure 2-4).

      (2)  Throughout the manuscript the authors invoke the role of telomere shortening in aging, and according to them, their Terc-KO mice should be one potential model for aging. Yet the authors consistently describe major differences between young Terc-KO and naturally aging old mice, with no discussion of the implications. This further confuses the biological significance of this work as presented.

      Thank you for mentioning this relevant point. We want to apologize for the confusion regarding this matter. While Tercko/ko mice are a well-established model for premature aging, these effects become more apparent with increasing generations (G) and thus, G5 and 6 mice are the most affected by Terc deletion (Lee et al., 1998; Wong et al., 2008).

      Thus, while Tercko/ko mice are a common model for premature aging, this accelerated aging phenotype is predominantly apparent in later-generation Tercko/ko (G5 and 6) or aged Tercko/ko mice (Lee et al., 1998; Wong et al., 2008). Since the aim of this study was to analyze the impact of Terc deletion on the lung and its immune response to bacterial infections instead of the impact of telomere shortening and telomerase dysfunction, young G3 Tercko/ko mice (8 weeks) were used in this study. This is also mentioned in the lines 131-134. In this study, Tercko/ko mice were used not as a model of aging, but rather as a model specifically for Terc deletion. The old WT mice function as a control cohort to observe possible common but also deviating effects between aging and Terc deletion. In our sequencing data, we observe that uninfected young WT mice are very similar to uninfected Tercko/ko mice. Other studies have also reported this lack of major differences between uninfected WT and Tercko/ko mice in the G3 knockout mice (Kang et al., 2018). Conversely, uninfected young WT and Tercko/ko mice exhibited great differences, for instance, regarding the numbers of differentially expressed genes (Supplemental Figure 1H). Thus, differences between naturally aged mice and young G3 Tercko/ko mice are not surprising. To clarify this aspect we reconstructed the paragraph discussing the Tercko/ko mice (lines 126-134). Additionally we added a paragraph explaining the purpose of the naturally aged mice to the lines 134 to 138:

      “As control cohort age-matched young WT mice were utilized. To investigate whether Terc deletion, beyond critical telomere shortening, impacts the pulmonary immune response, we used young Tercko/ko mice. Additionally, naturally aged mice (2 years old) were infected to explore the potential link to a fully developed aging phenotype.”

      (3)  Related to #2, group design for comparisons lacks a clear rationale. The authors stipulate that Terc- KO will mimic natural aging, but in fact, the only significant differences seen between groups in susceptibility to S. aureus are, contrary to the authors' expectation, between young Terc-KO and naturally old mice (Figures 1A and B, no difference between young Terc-KO and young wt); or there are no significant differences at all between groups (Figures 1, C, D,).

      We thank the reviewer for this essential comment. As mentioned above the Tercko/ko mice in this study are not selected to model natural aging. To model telomerase dysfunction and accelerated aging selection of later generation or aged Tercko/ko mice would have been more suitable.

      The lack of statistical significance in some figures is likely due to the heterogeneity of disease phenotype of S. aureus infection in mice, which is a limitation of our study that we discuss in our discussion section in lines 577-583. The phenotype of S. aureus infection can vary greatly within a mouse population, highlighting the limitations of mice as a model for S. aureus infections. To account for this heterogeneity we divided the infected Tercko/ko mice cohort into different degrees of severity based on the clinical score and the presence of bacteria in organs other than the lung (mice with systemic infection).

      Despite the heterogeneity especially within the Tercko/ko mice cohort the differences between the knockout and young as well as old WT mice were striking. Including the fatal infections, 80% of the Tercko/ko mice had a severe course of disease, while none of the WT mice displayed a severe course (Figure 1A, B and Supplemental Figure 1A, B). This hints towards a clear role of Terc in the response to S. aureus infection in mice. Thus while in some figures the differences are not significant, strong trends towards a more severe phenotype of S. aureus infection in the Tercko/ko mice regarding bacterial load, score and inflammatory response could be observed in our study.

      Another example of inadequate group design is when the authors begin dividing their Terc-KO groups by clinical score into animals with or without "systemic infection" (the condition where a bacterium spreads uncontrollably across the many organs and via blood, which should be properly called sepsis), and then compare this sepsis group to other groups (Supplementary Figures 1G; Figure 2; lines 374-376 and 389- 391). This gives them significant differences in several figures, but because they did not clearly indicate where they applied this stratification in the figure legends, the data are somewhat confusing. Most importantly, methodologically it is highly inappropriate to compare one mouse with sepsis to another one without. If Terc-KO mice with sepsis are a comparator group, then their controls have to be wild-type mice with sepsis, who are dealing with the same high bacterial load across the body and are presumably forced to deploy the same set of immune defenses.

      We sincerely appreciate the significant time and effort you have invested in reviewing our manuscript. However, with all due respect, we must point out that the definition of sepsis you have referenced is considered outdated. According to the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3), sepsis is defined as "a life-threatening organ dysfunction caused by a dysregulated host response to infection" (Marvin Singer, 2016, JAMA). Given this fundamental misunderstanding of our findings, we find the comment regarding the inadequacy of our groups to be both dismissive and lacking in scientific merit. We would like to emphasize that the group size used in our study is consistent with accepted standards in infection research. We strongly reject any insinuations of inadequacy that have been repeatedly mentioned throughout the review.

      In order to provide a nuanced investigation of disease severity in Tercko/ko mice, we added the term “systemic infection” to the figures whenever the mice were divided into groups of mice with and without systemic infection. This is the case for Figure 2A and Supplemental Figure 1C-E. The division into mice with and without systemic infection is also mentioned in the figure legend of Figure 2A in lines 933 to 936 and for Supplemental Figure 1 in lines 1053-1054. We agree that Supplemental Figure 1G is somewhat confusing as the mice with systemic infection are highlighted in this graph but not included as a separate group within our sequencing analysis. We added a sentence to the figure legend clarifying this (lines 1042-1045):

      “Nevertheless, the infected Tercko/ko mice were considered one group for the expression analysis and not split into separate groups for the subsequent analysis.”

      Additionally, we revised the section regarding this grouping in different degrees of severity in our Material and Methods section to clarify that this division was only performed for specific analysis (line 191):

      “…for the indicated analysis.”

      Furthermore, the mice which were classified as systemically infected mice were not septic mice, as mentioned above. Those mice were classified by us as systemically infected based on their clinical score and the presence of bacteria in other organs than the lung as stated in the lines 188-191 and 377-382.

      Bacteremia is a symptom of very severe cases of hospital-acquired pneumonia with a very high mortality (De la Calle et al., 2016).

      Therefore, the systemically infected mice or rather mice with bacteremia display an especially severe pneumonia phenotype, which is distinct from sepsis. The presence of this symptom in our Tercko/ko mice further highlights the clinical relevance of our study. This aspect was added to the manuscript in the lines 569-571.

      “The detection of bacteria in extra pulmonary organs is of particular interest, as bacteremia is a symptom of severe pneumonia and is associated with high mortality (De la Calle et al., 2016).”

      (4)  The authors conclude that disregulated inflammation and T-cell dysfunction play a major role in S. aureus susceptibility. This may or may not be an important observation, because many KO mice are abnormal for a variety of reasons, and until such reasons are mechanistically dissected, the physiological importance of the observation will remain unclear.

      Two points are important here. First, there is no natural counterpart to a Terc-KO, which is a complete loss of a key non-enzymatic component of the telomerase complex starting in utero.

      Second, the authors truly did not examine the key basic features of their model, including the features of basic and induced inflammatory and immune responses. This analysis could be done either using model antigens in adjuvants, defined innate immune stimuli (e.g. TLR, RLR, or NLR agonists), or microbial challenge. The only data provided along these lines are the baseline frequencies of total T cells in the spleen of the three groups of mice examined (not statistically significant, Figure 4B). We do not know if the composition of naïve to memory T cell subsets may have been different, and more importantly, we have no data to evaluate whether recruitment of the immune response (including T cells) to the lung upon microbial challenge is similar or different. So, what are the numbers and percentages of T cells and alveolar macrophages in the lung following S. aureus challenge and are they even comparable or are there issues in mobilizing the T cell response to the site of infection? If, for example, Terc-KO mice do not mobilize enough T cells to the lung during infection, that would explain the paucity in many T-cell- associated genes in their transcriptomic set that the authors report. That in turn may not mean dysfunction of T cells but potentially a whole different set of defects in coordinating the response in Terc-KO mice.

      We thank the reviewer for highlighting these important aspects. Regarding the first point, indeed there is no naturally occurring deletion of Terc in humans. However, studies reported reduced expression of Terc and Tert in the tissues of aged mice and rats (Tarry-Adkins et al., 2021; Zhang et al., 2018). Terc itself has been found to have several important immunomodulatory functions such as the activation of the NF- κB or PI3-kinase pathway (Liu et al., 2019; Wu et al., 2022). As those aforementioned pathways are relevant for the immune response to S. aureus infections, the authors were interested in exploring the impact of Terc deletion on the pulmonary immune response. The potential immunomodulatory functions of Terc are discussed in lines 106-121. To further clarify our rationale we added a sentence to the introduction in lines 121-125.

      “Interestingly, downregulation of Terc and Tert expression in tissues of aged mice and rats has been found (Tarry-Adkins, Aiken, Dearden, Fernandez-Twinn, & Ozanne, 2021; Zhang et al., 2018).

      Therefore, as a potential immunomodulatory factor reduced Terc expression could be connected to age- related pathologies.”

      Regarding the second point, as we focused on the effect of Terc deletion in the lung and its role in S. aureus infection, we investigated inflammatory and immune response parameters relevant to this setting. For instance, inflammation parameters in the lungs of all three mice cohorts were measured to investigate differences in the inflammatory response in the non-infected and infected mice (Figure 2A). Those measurements showed no baseline difference in key inflammatory parameters between young WT and Tercko/ko mice, which is consistent with previous findings (Kang et al., 2018). The inflammatory response to infection with S. aureus in the Tercko/ko mice cohort differed significantly from the other cohorts (Figure 2A), hinting towards a dysregulated inflammatory response due to Terc deletion. Furthermore, we investigated general immune cell frequencies such as dendritic cells, macrophages, and B cells in the spleen of all three mice cohorts to gather a baseline understanding of the general immune cell populations. In our manuscript only total T cell frequencies were included due to its relevance for our data regarding T cells (Figure 4B). This data could show that there was no difference of total amount of T cells in the spleen of all three mice cohorts. For a more detailed insight into our analysis we added the frequencies of the other immune cell populations analyzed in the spleen as a Supplemental Figure 3B-F. Additionally, a figure legend for the graphs was added.

      Therefore, while we did not analyze baseline frequencies of specific populations of T cells, we analyzed and characterized the inflammatory and immune response of our model in a way relevant to our research question.

      The differences observed in T cell marker and TCR gene expression was also partly present between the uninfected and infected Tercko/ko mice such as the complete absence of CD247 expression in infected Tercko/ko, which is however expressed in uninfected mice of this cohort (Figure 4A, C and D). Thus, this effect cannot be solely attributed to an inadequate mobilization of T cells to the lung after infectious challenge. However, we agree that a more detailed insight into recruited immune cells to the lung or frequencies of different T cell populations could contribute to a better understanding of the proposed mechanism and would be an interesting experiment to conduct in further studies. We accept this as a limitation of our study and included it in our discussion section in lines 720-724:

      “As total CD4+ T cells were analyzed in this study, it would be useful to investigate specific T cell populations such as memory and effector T cells to elucidate the potential mechanism leading to T cell dysfunctionality in further detail. Additionally, analysis of differences in immune cell recruitment to the lungs between young WT and Tercko/ko mice would be relevant.”

      (5)  Related to that, immunological analysis is also inadequate. First, the authors pull signatures from the total lung tissue, which is both imprecise and potentially skewed by differences, not in gene expression but in types of cells present and/or their abundance, a feature known to be affected by aging and perhaps by Terc deficiency during infection. Second, to draw any conclusions about immune responses, the authors would have to track antigen-specific T cells, which is possible for a wide range of microbial pathogens using peptide-MHC multimers. This would allow highly precise analysis of phenomena the authors are trying to conclude about. Moreover, it would allow them to confirm their gene expression data in populations of physiological interest

      We thank the reviewer for highlighting this important and relevant point. In our study, we aimed to investigate the role of Terc expression in modulating inflammation and the immune response to S. aureus infection in the lung. To address this, we examined the overall impact of age, genotype, and infection on lung inflammation and gene expression. Therefore, sequencing of total lung tissue was essential for addressing the research question posed. Our findings demonstrate that Tercko/ko mice exhibit a more severe phenotype following S. aureus infection, characterized by an increased bacterial load and heightened lung inflammation (Figures 1 and 2). Furthermore, our data suggest that Terc plays a role in regulating inflammation through activation of the NLRP3 inflammasome, along with the dysregulation of several T cell marker genes (Figures 2, 4, and 5). However, this study lacks a detailed analysis of distinct T cell populations, including antigen-specific T cells, as noted earlier. Investigating these aspects in future studies would be valuable to validate and expand upon our findings. We have incorporated these suggestions into the discussion section (lines 720-724)

      “As total CD4+ T cells were analyzed in this study, it would be useful to investigate specific T cell populations such as memory and effector T cells to elucidate the potential mechanism leading to T cell dysfunctionality in further detail. Additionally, analysis of differences in immune cell recruitment to the lungs between young WT and Tercko/ko mice would be relevant.”

      Nevertheless, our study provides first evidence of a potential connection between T cell functionality and Terc expression.

      Third, the authors co-incubate AM and T cells with S. aureus. There is no information here about the phenotype of T cells used. Were they naïve, and how many S. aureus-specific T cells did they contain? Or were they a mix of different cell types, which we know will change with aging (fewer naïve and many more memory cells of different flavors), and maybe even with a Terc-KO? Naïve T cells do not interact with AM; only effector and memory cells would be able to do so, once they have been primed by contact with dendritic cells bringing antigen into the lymphoid tissues, so it is unclear what the authors are modeling here. Mature primed effector T cells would go to the lung and would interact with AM, but it is almost certain that the authors did not generate these cells for their experiment (or at least nothing like that was described in the methods or the text).

      Thank you for bringing up this important question. For the co-cultivation experiment of T cells and alveolar macrophages, total CD4+ T cells of both young WT and Tercko/ko were used. We did not select for a specific population of T cells. Our sequencing data indicated the complete downregulation of CD247 expression, which is an important part of the T cell receptor, in the lungs of infected Tercko/ko mice (Figure 4A, C and D). Given that this factor is downregulated under chronic inflammatory conditions, we investigated the impact of the inflammatory response in alveolar macrophages on the expression of various T cell-derived cytokines, as well as CD247 expression (Figure 5D, E) (Dexiu et al., 2022). This aspect is also highlighted in the discussion in lines 623-637. Therefore, a co-cultivation model of T cells and alveolar macrophages was established and confronted with heat-killed S. aureus to elicit an inflammatory response of the macrophages. To emphasize this purpose, we have revised our statement about the model setup in lines 517-519 of the manuscript:

      “An overactive inflammatory response could be a potential explanation for the dysregulated TCR signaling.”

      The authors hope this will clarify the intent behind the model setup.

      (6)  Overall, the authors began to address the role of Terc in bacterial susceptibility, but to what extent that specifically involves inflammation and macrophages, T cell immunity, or aging remains unclear at present.

      We thank the reviewer for the helpful and relevant comments. The authors accept the limitations of the presented study such as the reduced number of Tercko/ko mice and the limitations of murine models for S. aureus infection itself and discuss those in the discussion section in the lines 559-561; 577-583; 690-692 and 720-726. However, we hope that our responses have provided sufficient evidence to convince the reviewer that our data supports a clear role for Terc expression in regulating the immune response to bacterial infections, particularly with respect to inflammation and its potential connection to T cell functionality.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:<br /> I really enjoyed this manuscript from Torsekar et al on "Contrasting responses to aridity by

      different-sized decomposers cause similar decomposition rates across a precipitation gradient". The authors aimed to examine how climate interacts with decomposers of different size categories to influence litter decomposition. They proposed a new hypothesis: "The opposing climatic dependencies of macrofauna and that of microorganisms and mesofauna should lead to similar overall decomposition rates across precipitation gradients".

      This study emphasizes the importance as well as the contribution of different groups of organisms (micro, meso, macro, and whole community) across different seasons (summer with the following characteristics: hot with no precipitation, and winter with the following characteristics: cooler and wetter winter) along a precipitation gradient. The authors made use of 1050 litter baskets with different mesh sizes to capture decomposers contribution. They proposed a new hypothesis that was aiming to understand the "dryland decomposition conundrum". They combined their decomposition experiment with the sampling of decomposers by using pittfall traps across both experiment seasons. This study was carried out in Israel and based on a single litter species that is native to all seven sites. The authors found that microorganism contribution dominated in winter while macrofauna decomposition dominated the overall decomposition in summer. These seasonality differences combined with the differences in different decomposers groups fluctuation along precipitation resulted in similar overall decomposition rates across sites.<br /> I believe this manuscript has a potential to advance our knowledge on litter decomposition.

      Strengths:

      Well design study with combination of different approaches (methods) and consideration of seasonality to generalize pattern.

      The study expands to current understanding of litter decomposition and interaction between factors affecting the process (here climate and decomposers).

      Weaknesses:

      The study was only based on a single litter species.

      We now discuss the advantages and limitations of this approach in the methods and devote a completely new paragraph to this important point in the discussion (lines 394-401).

      Reviewer #2 (Public Review):

      Summary: Torsekar et al. use a leaf litter decomposition experiment across seasons, and in an aridity gradient, to provide a careful test of the role of different-sized soil invertebrates in shaping the rates of leaf litter decomposition. The authors found that large-sized invertebrates are more active in the summer and small-sized invertebrates in the winter. The summed effects of all invets then translated into similar levels of decomposition across seasons. The system breaks down in hyper-arid sites.

      Strengths: This is a well-written manuscript that provides a complete statistical analysis of a nice dataset. The authors provide a complete discussion of their results in the current literature.

      Weaknesses:

      I have only three minor comments. Please standardize the color across ALL figures (use the same color always for the same thing, and be friendly to color-blind people).

      Thank you for this important suggestion. We have now changed all figures to standardize all colors and chose a more color-blind friendly pallete.

      Fig 1 may benefit from separating the orange line (micro and meso) into two lines that reflect your experimental setup and results. I would mention the dryland decomposition conundrum earlier in the Introduction.

      We based our novel hypotheses on a thorough literature search. Accordingly, decomposition is expected to be positively associated with moisture, regardless of the decomposer body size. Our contribution to theory was to suggest that macro-detritivores may respond very differently to climatic conditions and dominate litter decomposition in warm arid-lands (we listed the reasons in the text). Consequently, we did not distinguish between microorganisms and mesofauna. We assumed that both groups inhabit the litter substrate and have limited adaptation to dry conditions. Our results provide strong evidence that this presumption is likely wrong and that mesofauna respond to climate very differently from micro-decomposers. Yet, we cannot use hindsight understanding to improve our original hypothesis. We now emphasize this important point at the discussion as important future direction. 

      Although we are very appreciative and pleased with the reviewer enthusiasm to highlight the importance of our work as a possible solution to the longstanding dryland decomposition conundrum, we decided not to move it to the introduction. This is because we think that our work is not centred on resolving the DDC but provides more general principles that may lead to a paradigm shift in the way ecologists study nutrient cycling across ecosystems.

      And the manuscript is full of minor grammatical errors. Some careful reading and fixing of all these minor mistakes here and there would be needed.

      We apologize and did our best to find and fix those mistakes

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I really enjoyed this manuscript from Torsekar et al on "Contrasting responses to aridity by different-sized decomposers cause similar decomposition rates across a precipitation gradient". The authors aimed to examine how climate interacts with decomposers of different size categories to influence litter decomposition. They proposed a new hypothesis: "The opposing climatic dependencies of macrofauna and that of microorganisms and mesofauna should lead to similar overall decomposition rates across precipitation gradients".

      This study emphasizes the importance as well as the contribution of different groups of organisms (micro, meso, macro, and whole community) across different seasons (summer with the following characteristics: hot with no precipitation, and winter with the following characteristics: cooler and wetter winter) along a precipitation gradient. The authors made use of 1050 litter baskets with different mesh sizes to capture decomposers contribution. They proposed a new hypothesis that was aiming to understand the "dryland decomposition conundrum". They combined their decomposition experiment with the sampling of decomposers by using pitfall traps across both experiment seasons. This study was carried out in Israel and based on a single litter species that is native to all seven sites. The authors found that microorganism contribution dominated in winter while macrofauna decomposition dominated the overall decomposition in summer. These seasonality differences combined with the differences in different decomposers groups fluctuation along precipitation resulted in similar overall decomposition rates across sites.

      I believe this manuscript has the potential to advance our knowledge on litter decomposition. Below i provide my general and specific comments.

      General comments:

      (1) Study in general is well designed and well thought beforehand,

      (2) Study aims to expand the current understanding of the dryland decomposition conundrum

      (3) The should put a caveat to the fact they only use one litter species and call for examining litter mixture in the same gradient.

      (4) Please check the way you reduce the random effects from your initial model, I have provided a better way to do so in my specific comments

      (5) For Figure 1, authors can check my comment on this and see if they could revise the figure.

      Thank you for the positive feedback and your valuable comments. We have tried to best address all comments and suggestions for improvement and clarification

      Specific comments

      Line # 57 Please write "Theory suggests" instead of "Theory suggest"

      We changed the text as suggested

      Line # 70, please write "Indeed, handful evidence shows" instead of "Indeed, handful evidence show"

      We changed the text as suggested

      Figure 1: I like this conceptual framework. I have a silly question, why is it that the slopes of the whole community at the beginning (between Hyperarid and Arid) is the same as the Macro fauna, I would think the slope should be higher as this is adding up right? and also the same goes for the decomposition of whole community later on. For me this should reflect the adding or summing up (if i am right) then the authors should think about how this could be reflected in the figure.

      We agree with your interpretation that the whole community decomposition reflects the addition by constituent decomposers. The slope of the whole community decomposition between hyper-arid and arid is slightly higher than the one of macro decomposition to reflect the additive effect of macro with meso+micro decomposition. We have now changed the figure slightly to make this point more visible (Line 106).

      Line # 111 Please make "Methods" bold as well to be consistent with others headings.

      We changed the formatting as suggested

      Line #125 and in other lines as well please replace "X" by "x" to denote multiplication.

      We changed the formatting as suggested

      Table 1 Please add "*" to climate like this "Climate*" so that the end note of the table could make sense

      Thank you for this suggestion. We have now added the asterisk referring to the note below the Table.

      Figure 2, please consider putting at line #133, mean annual precipitation (MAP), as such for line # 135 You can directly says The precipitation map ....

      We made both changes as suggested.

      Line # 138 I would not use the different units for the same values. I do understand that you want to emphasize the accuracy but i would write instead 3 +- 0.001 g

      We changed the units as suggested.

      Line # 145, how is the litter basket customized to rest at 1 cm above ground level?

      We have now clarified –that we cut-open windows one centimeter above the cage floor. The cages were positioned on the soil (line 144).

      Lines # 181-183, I like the approach of checking the necessity of having the random effects. However, it has been reported that likelihood ratio test (LRT) are not really reliable to test for random effects. I will suggest you rather use permutations instead. I think the function is confint(MODEL) you need to specify the number of permutation the higher the better but you should start with 99 first and see how the results look like if promising then you can even go to 9999. But it will need computation power and and time.

      Thank you for the suggestion. We now used a simulation-based exact test, instead of a LRT, to examine the random effect, as recommended by the authors from the “lme4” package. As recommended, we used 9999 simulations. The simulation test yielded a similar result to those originally reported (see lines 181-183).

      Line # 187, 188, 188, please do not use capital letter to start mesofauna, macrofauna and whole-community

      We changed the formatting as suggested

      Line # 205 Please add the version number of R in the text.

      We now included the version number as suggested.

      Line # 209-211, could you please check whether "then" is the word you want to use or "than"

      Our bad- we indeed meant “than” and have made the appropriate changes.

      Line # 227 and in other places as well please provide the second degree of freedom of the F test.

      Thank you for this important comment. We have now added the second degree of freedom to the relevant results (lines 229, 232).

      Figure 3 and Figure 4 show some results that are negative, can you please explain what might be the reasons behind this?

      We now explain this important point in the figures’ captions.

      Figure 5 Please add label to the x-axis.

      Thank you-we have now included a label.

      Line # 357, the sentence "... meso-decomposition, like microbial decomposition,...", I don't understand which criteria authors used to classify microbial decomposition as "meso-decomposition"?

      We now remove this potential cause of confusion by using the term ‘meso-decomposition’ to distinguish from microbial decomposition (Line 366).

      Line # 380 Kindly put "per se" in italic.

      We changed the formatting as suggested

      References

      The references format are not consistent. For example for the same journal (say Trends in Ecology and Evolution) the authors sometimes wrote the full name like at line # 36 (and also realize that "vol" should not be written as such) but wrote the abbreviations at line #42

      Our bad- we apologize and carefully checked all references to make sure the style is consistent.

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) Combined Public Reviews:

      Strengths:

      This work investigates the role of DNAH3 in sperm mobility and male infertility and utilised gold-standard molecular biology techniques, showing strong evidence of its role in male infertility. All aspects of the study design and methods are well described and appropriate to address the main question of the manuscript. The conclusions drawn are consistent with the analyses conducted and supported by the data.

      We extend our sincere gratitude to the expert reviewers for their valuable comments and insightful suggestions.

      Weaknesses:

      (1.1) The manuscript lacks a comparison with previous studies on DNAH3 in the Discussion section.

      We thank the reviewers' comments.

      Recently, Meng et al. identified bi-allelic variants in DNAH3 from patients diagnosed with asthenoteratozoospermia, revealing multiple morphological defects and a disrupted "9+2" arrangement in the patients' sperm (https://doi.org/10.1093/hropen/hoae003, PMID: 38312775). Furthermore, they generated Dnah3 KO mice, which were infertile, and exhibited moderate morphological abnormalities with a normally structured “9 + 2” microtubule arrangement. In our study, we also observed similar phenotypic differences between the phenotypes of DNAH3-deficient patients and Dnah3 KO mice. These findings indicate that DNAH3 may play crucial yet distinct roles in human and mouse male reproduction. Additionally, our TEM analysis demonstrated a notable absence of IDAs in sperm from both DNAH3-deficent patients and Dnah3 KO mice, resembling the findings of Meng et al. To further investigate, we conducted immunofluorescent staining and western blotting to assess the levels of IDA-associated proteins (DNAH1, DNAH6 and DNALI1) and ODA-associated proteins (DNAH8, DNAH17 and DNAI1) in sperm samples from both our DNAH3-deficient patients and Dnah3 KO mice. Our data revealed a reduction in IDA-associated protein levels and comparable ODA-associated protein levels in comparison to normal controls and WT mice, respectively, thus corroborating the TEM observations. These results suggest that DNAH3 is involved in sperm flagellar development in human and mice, specifically through its role in the assembly of IDAs.

      Intriguingly, in our study, none of the patients with DNAH3 deficiency reported experiencing any of the principal symptoms associated with PCD. Additionally, our Dnah3 KO mice exhibited normal ciliary development in the lung, brain, eye, and oviduct. Similarly, Meng et al. did not mention any PCD symptoms in their DNAH3-deficient patients, and their Dnah3 KO mice also demonstrated normal ciliary morphology in the trachea and brain. These combined observations suggest that DNAH3 may play a more significant role in sperm flagellar development than in other motile cilia functions. Given that DNAH3 is expressed in ciliary tissues, its role in these tissues remains intriguing and could be elucidated through sequencing of larger cohorts of individuals with PCD.

      We have added these discussions in line 267 to 283, and line 300 to 303.

      (1.2) The variants of DNAH3 in four infertile men were identified through whole-exome sequencing. Providing an overview of the WES data would be beneficial to offer additional insights into whether other variants may contribute the infertility. This could also help explain why ICSI only works for two out of four patients with DNAH3 variants.

      We thank the reviewer's helpful suggestions.

      We have deposited the raw whole-exome sequencing data in the National Genomics Data Center (NGDC) (https://ngdc.cncb.ac.cn/, accession number: HRA007467). The clean reads, sequencing depth, sequencing coverage, and mapping quality of the WES on the patients are listed below (Table R1). A summary of WES has been presented in Table S1.

      Author response table 1.

      Quality of whole exome sequencing on infertile men.

      The variants identified through WES were annotated and filtered using Exomiser. Next, the variants were screened to obtain candidate variants based on the following criteria: (1) the allele frequency in the East Asian population was less than 1% in any database, including the ExAC Browser, gnomAD, and the 1000 Genomes Project; (2) the variants affected coding exons or canonical splice sites; (3) the variants were predicted to be possibly pathogenic or damaging.

      Following filtering and screening, the numbers of candidate variants obtained were as follows: Patient 1: 98, Patient 2: 101, Patient 3: 67, and Patient 4: 91(Table S1). Subsequently, we utilized the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/) and Mouse Genome Informatics (MGI) database (https://informatics.jax.org/) to analyze the expression patterns of corresponding genes. Variants whose corresponding genes were not expressed in the human or mouse testis were excluded from further consideration. We also consulted OMIM database and reviewed relevant literature to exclude variants associated with diseases unrelated to male infertility. Additionally, considering the assumption of a recessive inheritance pattern, we excluded all monoallelic variants. Ultimately, only bi-allelic variants in DNAH3 (NG_052617.1, NM_017539.2, NP_060009.1) remained, suggesting as the pathogenic variants responsible for the infertility of the patients (Table S1). These DNAH3 variants were verified by Sanger sequencing on DNA from the patients' families.

      We have added the overview of the WES in Table S1 and supplemented the analysis process of WES data in line 100 to 106, and line 348 to 360.

      Additionally, we did not identify any pathogenic variants that associated with fertilization failure and early embryonic development in the two patients with failed ICSI outcomes. Therefore, these different ICSI outcomes might be attributed to additional unexplained factors from the female partners.

      (1.3) Quantification of images would help substantiate the conclusions, particularly in Figures 2, 3, 4, and 6. Improved images in Figures 3A, 4B, and 4C, would help increase confidence in the claims made.

      In response to reviewer’s valuable suggestions. We presume that the reviewer means quantification of images in Figure S6, but not Figure 6.

      We have compiled statistics for results shown in Figures 2, 3, 4, and S6. Specifically:

      - The percentages of abnormal flagellar morphology in normal control and patients, associated with the observations in Figure 2A, have been shown in Figure S1A.

      - The percentages of aberrant axonemal ultrastructure in different cross-sections of sperm from in normal control and patients, correspond to the findings in Figure 3A, have been presented in Figure S1B.

      - The percentages of abnormal flagellar morphology in WT mice and Dnah3 KO mice have been shown in Figure S7A.

      - The percentages of aberrant axonemal arrangement in different cross-sections of sperm from WT mice and Dnah3 KO mice, corresponding to the findings in Figure 4B, have been presented in Figure S7C.

      - The percentages of microtubule doublets presenting IDAs in sperm from WT mice and Dnah3 KO mice, related to Figure 4B, have been detailed in Figure S7D.

      - The percentages of malformed mitochondria in the midpiece of sperm from WT mice and Dnah3 KO mice, associated with the observations in Figure 4C, have been presented in Figure S7E.

      Moreover, we have revised Figures 3A, 4B, and 4C by replacing the unclear TEM images.

      (2) Reviewer #1 (Recommendations for The Authors):

      (2.1) Please add reference(s) that support what is claimed in lines 83-84.

      We are very grateful for the reviewer's careful comments, we have added a reference that describing the homology and expression of DNAH3.

      (2.2) In line 286, change "suggested" to "suggest".

      Thanks for the reviewer's comments. We have corrected the grammar.

      (2.3) Please add reference(s) that support what is claimed in lines 359-360.

      According to the reviewer’s suggestions, we have included references detailing the STA-PUT velocity sedimentation for isolation of single human and mouse testicular cells.

      (2.4) In line 365, change "in" to "into".

      Thanks for the reviewer’s careful comments, we have corrected this word.

      (2.5) In Figure 7, I suggest changing "patients" to "wife or partners of patient". Given that the results are indeed from the spouses of the infertile men, I suggest making this small change to keep the consistency and clarity of what the authors did.

      In response to reviewer’s kind suggestions, we have replaced “Patient” by “partners of Patient” and revised Figure 7.

      (3) Reviewer #2 (Recommendations for The Authors):

      (3.1) A summary of the WES data would be needed (i.e. number of reads, mapping quality, etc). As mentioned in the public review, it would be beneficial to present a summary of all variants identified in the data and clarify whether DNAH3 is the only gene that contains variants and whether these variants have been validated.

      Many thanks for reviewer’s kind suggestions.

      The clean reads, sequencing depth, sequencing coverage, and mapping quality of the WES on the patients are listed (see author response table 1) A summary of WES has been presented in Table S1.

      The variants identified through WES were annotated and filtered using Exomiser. Next, the variants were screened to obtain candidate variants based on the following criteria: (1) the allele frequency in the East Asian population was less than 1% in any database, including the ExAC Browser, gnomAD, and the 1000 Genomes Project; (2) the variants affected coding exons or canonical splice sites; (3) the variants were predicted to be possibly pathogenic or damaging.

      Following filtering and screening, the numbers of candidate variants obtained were as follows: Patient 1: 98, Patient 2: 101, Patient 3: 67, and Patient 4: 91(Table S1). Subsequently, we utilized the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/) and Mouse Genome Informatics (MGI) database (https://informatics.jax.org/) to analyze the expression patterns of corresponding genes. Variants whose corresponding genes were not expressed in the human or mouse testis were excluded from further consideration. We also consulted OMIM database and reviewed relevant literature to exclude variants associated with diseases unrelated to male infertility. Additionally, considering the assumption of a recessive inheritance pattern, we excluded all monoallelic variants. Ultimately, only bi-allelic variants in DNAH3 (NG_052617.1, NM_017539.2, NP_060009.1) remained, suggesting as the pathogenic variants responsible for the infertility of the patients (Table S1). These DNAH3 variants were verified by Sanger sequencing on DNA from the patients' families.

      We have added the overview of the WES in Table S1 and supplemented the analysis process of WES data in line 100 to 106, and line 348 to 360.

      (3.2) It would be beneficial to the scientific community if the raw data of WES could be uploaded to a public data repository, such as GEO.

      According to the reviewer's suggestion, we have deposited the raw whole-exome sequencing data in the National Genomics Data Center (NGDC) (https://ngdc.cncb.ac.cn/, accession number: HRA007467) and described its availability in the "Data Availability" section.

      (3.3) In line 115, it is not clear how the prediction was made. Clarifying them by adding citations or describing methods that predict these pathways/functions would help strengthen it.

      Thanks for the reviewer's comments.

      SIFT, PolyPhen-2, MutationTaster and CADD assess the deleteriousness of genetic variants by considering genomic features and evolutionary constraint of the surrounding sequence or structural and chemical property altercations by the amino acid substitutions. We have added websites and references of these tools in the manuscript (line 116 to 118).

      Here are the principles of these tools.

      - The SIFT considers the position at which the change occurred and the type of amino acid change, and then to predict whether an amino acid substitution in a protein will affect protein function [https://sift.bii.a-star.edu.sg/, PMID: 12824425].

      - The PolyPhen-2 predicts the impact of an amino acid substitution on a human protein by considering several features, including sequence, phylogenetic, and structural information [http://genetics.bwh.harvard.edu/pph2/, PMID: 20354512].

      - The MutationTaster utilizes a Bayes classifier to predict the functional consequences of amino acid substitutions, intronic and synonymous changes, short insertions/deletions (indels), etc. [https://www.mutationtaster.org/, PMID: 24681721].

      - The CADD scores are based on diverse genomic features derived from surrounding sequence context, gene model annotations, evolutionary constraint, epigenetic measurements, and functional predictions [https://cadd.gs.washington.edu/, PMID: 30371827].

      (4) Reviewer #3 (Recommendations for The Authors):

      (4.1) Please ensure that all gene names used in your manuscript have been approved by the HUGO nomenclature committee. For example, "c.3590C>T (p.P1197L)" should be described as "c.3590C>T (Pro1197Leu)".

      In response to the reviewer's suggestion, we have improved all the names of gene and variants according to the HUGO nomenclature committee and HGVS Variant Nomenclature Committee, respectively.

      (4.2) For Table 1, the authors should provide the rates of abnormal sperm morphologies using the sperm cells from normal male controls.

      Thanks for the reviewer’s careful comments. Consistent with the WHO laboratory manual (World Health Organization. WHO laboratory manual for the examination and processing of human semen. World Health Organization, 2021.), our routine semen analysis establishes 4% as the minimum rate of sperm with normal morphology but does not define the maximum rate of various tail defects. However, we reviewed the routine semen analysis on the normal controls in our study, and the approximate distribution of sperm with various flagellar in the normal controls was as follows: normal flagella, 78.6%; absent flagella, 1.7%; short flagella, 0.6%; coiled flagella, 12.5%; bent flagella, 7.9%; irregular flagella, 1.8%.

      (4.3) In Table 2, "Mutation Tester" or "Mutation Taster"?

      We thank the reviewer’s comments. It should be "MutationTaster", and we have corrected this mistake in Table 2 and the manuscript.

      (4.4) In Figure 2B, the bars for patient 1 should be aligned. 

      Following the reviewer's valuable suggestion, we have ensured consistent scar bar alignment in Figure 2B and implemented this alignment throughout all other figures.

      (4.5) In Figure 3A, what about the ultrastructure for sperm heads in DNAH3 deficient sperm cell? The authors previously mentioned abnormalities in sperm head morphologies (Figure 2B) in patients with DNAH3 mutations.

      We thank the reviewers for their kind comments. A small fraction of abnormal sperm head of our patients was captured under TEM, manifested by round head with loose chromatin (Author response image 1)

      Author response image 1.

      Ultrastructure of sperm head from DNAH3-deficient infertile men. TEM analysis revealed a fraction of round head with loose chromatin in patients harboring DNAH3 variants. Scale bars, 200 nm.

      (4.6) In Figure S6, the authors should provide the rates of abnormal sperm morphologies for Dnah3 KO male mice.

      In response to the reviewer's valuable suggestion, we have quantified morphological defects in spermatozoa from both Dnah3 KO and WT mice. Compared to about 17% morphological abnormalities in sperm from WT mice, the morphological abnormalities in sperm from Dnah3 KO mice were about 37%. The results are presented in the revised Figure S7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study provides solid evidence that both psychiatric dimensions (e.g. anhedonia, apathy, or depression) and chronotype (i.e., being a morning or evening person) influence effort-based decision-making. Notably, the current study does not elucidate whether there may be interactive effects of chronotype and psychiatric dimensions on decision-making. This work is of importance to researchers and clinicians alike, who may make inferences about behaviour and cognition without taking into account whether the individual may be tested or observed out-of-sync with their phenotype.

      We thank the three reviewers for their comments, and the Editors at eLife. We have taken the opportunity to revise our manuscript considerably from its original form, not least because we feel a number of the reviewers’ suggested analyses strengthen our manuscript considerably (in one instance even clarifying our conclusions, leading us to change our title)—for which we are very appreciative indeed. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses an online cognitive task to assess how reward and effort are integrated in a motivated decision-making task. In particular the authors were looking to explore how neuropsychiatric symptoms, in particular apathy and anhedonia, and circadian rhythms affect behavior in this task. Amongst many results, they found that choice bias (the degree to which integrated reward and effort affects decisions) is reduced in individuals with greater neuropsychiatric symptoms, and late chronotypes (being an 'evening person').

      Strengths:

      The authors recruited participants to perform the cognitive task both in and out of sync with their chronotypes, allowing for the important insight that individuals with late chronotypes show a more reduced choice bias when tested in the morning.<br /> Overall, this is a well-designed and controlled online experimental study. The modelling approach is robust, with care being taken to both perform and explain to the readers the various tests used to ensure the models allow the authors to sufficiently test their hypotheses.

      Weaknesses:

      This study was not designed to test the interactions of neuropsychiatric symptoms and chronotypes on decision making, and thus can only make preliminary suggestions regarding how symptoms, chronotypes and time-of-assessment interact.

      We appreciate the Reviewer’s positive view of our research and agree with their assessment of its weaknesses; the study was not designed to assess chronotype-mental health interactions. We hope that our new title and contextualisation makes this clearer. We respond in more detail point-by-point below.

      Reviewer #2 (Public Review):

      Summary:

      The study combines computational modeling of choice behavior with an economic, effort-based decision-making task to assess how willingness to exert physical effort for a reward varies as a function of individual differences in apathy and anhedonia, or depression, as well as chronotype. They find an overall reduction in effort selection that scales with apathy and anhedonia and depression. They also find that later chronotypes are less likely to choose effort than earlier chronotypes and, interestingly, an interaction whereby later chronotypes are especially unwilling to exert effort in the morning versus the evening.

      Strengths:

      This study uses state-of-the-art tools for model fitting and validation and regression methods which rule out multicollinearity among symptom measures and Bayesian methods which estimate effects and uncertainty about those estimates. The replication of results across two different kinds of samples is another strength. Finally, the study provides new information about the effects not only of chronotype but also chronotype by timepoint interactions which are previously unknown in the subfield of effort-based decision-making.

      Weaknesses:

      The study has few weaknesses. One potential concern is that the range of models which were tested was narrow, and other models might have been considered. For example, the Authors might have also tried to fit models with an overall inverse temperature parameter to capture decision noise. One reason for doing so is that some variance in the bias parameter might be attributed to noise, which was not modeled here. Another concern is that the manuscripts discuss effort-based choice as a transdiagnostic feature - and there is evidence in other studies that effort deficits are a transdiagnostic feature of multiple disorders. However, because the present study does not investigate multiple diagnostic categories, it doesn't provide evidence for transdiagnosticity, per se.

      We appreciate Reviewer 2’s assessment of our research and agree generally with its weaknesses. We have now addressed the Reviewer’s comments regarding transdiagnosticity in the discussion of our revised version and have addressed their detailed recommendations below (see point-by-point responses).

      In addition to the below specific changes, in our Discussion section, we now have also added the following (lines 538 – 540):

      “Finally, we would like to note that as our study is based on a general population sample, rather than a clinical one. Hence, we cannot speak to transdiagnosticity on the level of multiple diagnostic categories.”

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Mehrhof and Nord study a large dataset of participants collected online (n=958 after exclusions) who performed a simple effort-based choice task. They report that the level of effort and reward influence choices in a way that is expected from prior work. They then relate choice preferences to neuropsychiatric syndromes and, in a smaller sample (n<200), to people's circadian preferences, i.e., whether they are a morning-preferring or evening-preferring chronotype. They find relationships between the choice bias (a model parameter capturing the likelihood to accept effort-reward challenges, like an intercept) and anhedonia and apathy, as well as chronotype. People with higher anhedonia and apathy and an evening chronotype are less likely to accept challenges (more negative choice bias). People with an evening chronotype are also more reward sensitive and more likely to accept challenges in the evening, compared to the morning.

      Strengths:

      This is an interesting and well-written manuscript which replicates some known results and introduces a new consideration related to potential chronotype relationships which have not been explored before. It uses a large sample size and includes analyses related to transdiagnostic as well as diagnostic criteria. I have some suggestions for improvements.

      Weaknesses:

      (1) The novel findings in this manuscript are those pertaining to transdiagnostic and circadian phenotypes. The authors report two separate but "overlapping" effects: individuals high on anhedonia/apathy are less willing to accept offers in the task, and similarly, individuals tested off their chronotype are less willing to accept offers in the task. The authors claim that the latter has implications for studying the former. In other words, because individuals high on anhedonia/apathy predominantly have a late chronotype (but might be tested early in the day), they might accept less offers, which could spuriously look like a link between anhedonia/apathy and choices but might in fact be an effect of the interaction between chronotype and time-of-testing. The authors therefore argue that chronotype needs to be accounted for when studying links between depression and effort tasks.

      The authors argue that, if X is associated with Y and Z is associated with Y, X and Z might confound each other. That is possible, but not necessarily true. It would need to be tested explicitly by having X (anhedonia/apathy) and Z (chronotype) in the same regression model. Does the effect of anhedonia/apathy on choices disappear when accounting for chronotype (and time-of-testing)? Similarly, when adding the interaction between anhedonia/apathy, chronotype, and time-of-testing, within the subsample of people tested off their chronotype, is there a residual effect of anhedonia/apathy on choices or not?

      If the effect of anhedonia/apathy disappeared (or got weaker) while accounting for chronotype, this result would suggest that chronotype mediates the effect of anhedonia/apathy on effort choices. However, I am not sure it renders the direct effect of anhedonia/apathy on choices entirely spurious. Late chronotype might be a feature (induced by other symptoms) of depression (such as fatigue and insomnia), and the association between anhedonia/apathy and effort choices might be a true and meaningful one. For example, if the effect of anhedonia/apathy on effort choices was mediated by altered connectivity of the dorsal ACC, we would not say that ACC connectivity renders the link between depression and effort choices "spurious", but we would speak of a mechanism that explains this effect. The authors should discuss in a more nuanced way what a significant mediation by the chronotype/time-of-testing congruency means for interpreting effects of depression in computational psychiatry.

      We thank the Reviewer for pointing out this crucial weakness in the original version of our manuscript. We have now thought deeply about this and agree with the Reviewer that our original results did not warrant our interpretation that reported effects of anhedonia and apathy on measures of effort-based decision-making could potentially be spurious. At the Reviewer’s suggestion, we decided to test this explicitly in our revised version—a decision that has now deepened our understanding of our results, and changed our interpretation thereof.  

      To investigate how the effects of neuropsychiatric symptoms and the effects of circadian measures relate to each other, we have followed the Reviewer’s advice and conducted an additional series of analyses (see below). Surprisingly (to us, but perhaps not the Reviewer) we discovered that all three symptom measures (two of anhedonia, one of apathy) have separable effects from circadian measures on the decision to expend effort (note we have also re-named our key parameter ‘motivational tendency’ to address this Reviewer’s next comment that the term ‘choice bias’ was unclear). In model comparisons (based on leave-one-out information criterion which penalises for model complexity) the models including both circadian and psychiatric measures always win against the models including either circadian or psychiatric measures. In essence, this strengthens our claims about the importance of measuring circadian rhythm in effort-based tasks generally, as circadian rhythm clearly plays an important role even when considering neuropsychiatric symptoms, but crucially does not support the idea of spurious effects: statistically, circadian measures contributes separably from neuropsychiatric symptoms to the variance in effort-based decision-making. We think this is very interesting indeed, and certainly clarifies (and corrects the inaccuracy in) our original interpretation—and can only express our thanks to the Reviewer for helping us understand our effect more fully.

      In response to these new insights, we have made numerous edits to our manuscript. First, we changed the title from “Overlapping effects of neuropsychiatric symptoms and circadian rhythm on effort-based decision-making” to “Both neuropsychiatric symptoms and circadian rhythm alter effort-based decision-making”. In the remaining manuscript we now refrain from using the word ‘overlapping’ (which could be interpreted as overlapping in explained variance), and instead opted to describe the effects as parallel. We hope our new analyses, title, and clarified/improved interpretations together address the Reviewer’s valid concern about our manuscript’s main weakness.

      We detail these new analyses in the Methods section as follows (lines 800 – 814):

      “4.5.2. Differentiating between the effects of neuropsychiatric symptoms and circadian measures on motivational tendency

      To investigate how the effects of neuropsychiatric symptoms on motivational tendency (2.3.1) relate to effects of chronotype and time-of-day on motivational tendency we conducted exploratory analyses. In the subsamples of participants with an early or late chronotype (including additionally collected data), we first ran Bayesian GLMs with neuropsychiatric questionnaire scores (SHAPS, DARS, AES respectively) predicting motivational tendency, controlling for age and gender. We next added an interaction term of chronotype and time-of-day into the GLMs, testing how this changes previously observed neuropsychiatric and circadian effects on motivational tendency. Finally, we conducted a model comparison using LOO, comparing between motivational tendency predicted by a neuropsychiatric questionnaire, motivational tendency predicted by chronotype and time-of-day, and motivational tendency predicted by a neuropsychiatric questionnaire and time-of-day (for each neuropsychiatric questionnaire, and controlling for age and gender).”

      Results of the outlined analyses are reported in the results section as follows (lines 356 – 383):

      “2.5.2.1 Neuropsychiatric symptoms and circadian measures have separable effects on motivational tendency

      Exploratory analyses testing for the effects of neuropsychiatric questionnaires on motivational tendency in the subsamples of early and late chronotypes confirmed the predictive value of the SHAPS (M=-0.24, 95% HDI=[-0.42,-0.06]), the DARS (M=-0.16, 95% HDI=[-0.31,-0.01]), and the AES (M=-0.18, 95% HDI=[-0.32,-0.02]) on motivational tendency.

      For the SHAPS, we find that when adding the measures of chronotype and time-of-day back into the GLMs, the main effect of the SHAPS (M=-0.26, 95% HDI=[-0.43,-0.07]), the main effect of chronotype (M=-0.11, 95% HDI=[-0.22,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remain. Model comparison by LOOIC reveals motivational tendency is best predicted by the model including the SHAPS, chronotype and time-of-day as predictors, followed by the model including only the SHAPS. Note that this approach to model comparison penalizes models for increasing complexity.

      Repeating these steps with the DARS, the main effect of the DARS is found numerically, but the 95% HDI just includes 0 (M=-0.15, 95% HDI=[-0.30,0.002]). The main effect of chronotype (M=-0.11, 95% HDI=[-0.21,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.18, 95% HDI=[0.05,0.33]) on motivational tendency remain. Model comparison identifies the model including the DARS and circadian measures as the best model, followed by the model including only the DARS.

      For the AES, the main effect of the AES is found (M=-0.19, 95% HDI=[-0.35,-0.04]). For the main effect of chronotype, the 95% narrowly includes 0 (M=-0.10, 95% HDI=[-0.21,0.002]), while the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remains. Model comparison identifies the model including the AES and circadian measures as the best model, followed by the model including only the AES.”

      We have now edited parts of our Discussion to discuss and reflect these new insights, including the following.

      Lines 399 – 402:

      “Various neuropsychiatric disorders are marked by disruptions in circadian rhythm, such as a late chronotype. However, research has rarely investigated how transdiagnostic mechanisms underlying neuropsychiatric conditions may relate to inter-individual differences in circadian rhythm.”

      Lines 475 – 480:

      “It is striking that the effects of neuropsychiatric symptoms on effort-based decision-making largely are paralleled by circadian effects on the same neurocomputational parameter. Exploratory analyses predicting motivational tendency by neuropsychiatric symptoms and circadian measures simultaneously indicate the effects go beyond recapitulating each other, but rather explain separable parts of the variance in motivational tendency.”

      Lines 528 – 532:

      “Our reported analyses investigating neuropsychiatric and circadian effects on effort-based decision-making simultaneously are exploratory, as our study design was not ideally set out to examine this. Further work is needed to disentangle separable effects of neuropsychiatric and circadian measures on effort-based decision-making.”

      Lines 543 – 550:

      “We demonstrate that neuropsychiatric effects on effort-based decision-making are paralleled by effects of circadian rhythm and time-of-day. Exploratory analyses suggest these effects account for separable parts of the variance in effort-based decision-making. It unlikely that effects of neuropsychiatric effects on effort-based decision-making reported here and in previous literature are a spurious result due to multicollinearity with chronotype. Yet, not accounting for chronotype and time of testing, which is the predominant practice in the field, could affect results.”

      (2) It seems that all key results relate to the choice bias in the model (as opposed to reward or effort sensitivity). It would therefore be helpful to understand what fundamental process the choice bias is really capturing in this task. This is not discussed, and the direction of effects is not discussed either, but potentially quite important. It seems that the choice bias captures how many effortful reward challenges are accepted overall which maybe captures general motivation or task engagement. Maybe it is then quite expected that this could be linked with questionnaires measuring general motivation/pleasure/task engagement. Formally, the choice bias is the constant term or intercept in the model for p(accept), but the authors never comment on what its sign means. If I'm not mistaken, people with higher anhedonia but also higher apathy are less likely to accept challenges and thus engage in the task (more negative choice bias). I could not find any discussion or even mention of what these results mean. This similarly pertains to the results on chronotype. In general, "choice bias" may not be the most intuitive term and the authors may want to consider renaming it. Also, given the sign of what the choice bias means could be flipped with a simple sign flip in the model equation (i.e., equating to accepting more vs accepting less offers), it would be helpful to show some basic plots to illustrate the identified differences (e.g., plotting the % accepted for people in the upper and lower tertile for the SHAPS score etc).

      We apologise that this was not made clear previously: the meaning and directionality of “choice bias” is indeed central to our results. We also thank the Reviewer for pointing out the previousely-used term “choice bias” itself might not be intuitive. We have now changed this to ‘motivational tendency’ (see below) as well as added substantial details on this parameter to the manuscript, including additional explanations and visualisations of the model as suggested by the Reviewer (new Figure 3) and model-agnostic results to aid interpretation (new Figure S3). Note the latter is complex due to our staircasing procedure (see new figure panel D further detailing our staircasing procedure in Figure 2). This shows that participants with more pronounced anhedonia are less likely to accept offers than those with low anhedonia (Fig. S3A), a model-agnostic version of our central result.

      Our changes are detailed below:

      After careful evaluation we have decided to term the parameter “motivational tendency”, hoping that this will present a more intuitive description of the parameter.

      To aid with the understanding and interpretation of the model parameters, and motivational tendency in particular, we have added the following explanation to the main text:

      Lines 149 – 155:

      “The models posit efforts and rewards are joined into a subjective value (SV), weighed by individual effort (and reward sensitivity (parameters. The subjective value is then integrated with an individual motivational tendency (a) parameter to guide decision-making. Specifically, the motivational tendency parameter determines the range at which subjective values are translated to acceptance probabilities: the same subjective value will translate to a higher acceptance probability the higher the motivational tendency.”

      Further, we have included a new figure, visualizing the model. This demonstrates how the different model parameters contribute to the model (A), and how different values on each parameter affects the model (B-D).

      We agree that plotting model agnostic effects in our data may help the reader gain intuition of what our task results mean. We hope to address this with our added section on “Model agnostic task measures relating to questionnaires”. We first followed the reviewer’s suggestion of extracting subsamples with higher and low anhedonia (as measured with the SHAPS, highest and lowest quantile) and plotted the acceptance proportion across effort and reward levels (panel A in figure below). However, due to our implemented task design, this only shows part of the picture: the staircasing procedure individualises which effort-reward combination a participant is presented with. Therefore, group differences in choice behaviour will lead to differences in the development of the staircases implemented in our task. Thus, we plotted the count of offered effort-reward combinations for the subsamples of participants with high vs. low SHAPS scores by the end of the task, averaged across staircases and participants.

      As the aspect of task development due to the implemented staircasing may not have been explained sufficiently in the main text, we have included panel (D) in figure 2.

      Further, we have added the following figure reference to the main text (lines 189 – 193):

      “The development of offered effort and reward levels across trials is shown in figure 2D; this shows that as participants generally tend to accept challenges rather than reject them, the implemented staircasing procedure develops toward higher effort and lover reward challenges.”

      To statistically test effects of model-agnostic task measures on the neuropsychiatric questionnaires, we performed Bayesian GLMs with the proportion of accepted trials predicted by SHAPS and AES. This is reported in the text as follows.

      Supplement, lines 172 – 189:

      “To explore the relationship between model agnostic task measures to questionnaire measures of neuropsychiatric symptoms, we conducted Bayesian GLMs, with the proportion of accepted trials predicted by SHAPS scores, controlling for age and gender. The proportion of accepted trials averaged across effort and reward levels was predicted by the Snaith-Hamilton Pleasure Scale (SHAPS) sum scores (M=-0.07; 95%HDI=[-0.12,-0.03]) and the Apathy Evaluation Scale (AES) sum scores (M=-0.05; 95%HDI=[-0.10,-0.002]). Note that this was not driven only by higher effort levels; even confining data to the lowest two effort levels, SHAPS has a predictive value for the proportion of accepted trials: M=-0.05; 95%HDI=[-0.07,-0.02].<br /> A visualisation of model agnostic task measures relating to symptoms is given in Fig. S4, comparing subgroups of participants scoring in the highest and lowest quartile on the SHAPS. This shows that participants with a high SHAPS score (i.e., more pronounced anhedonia) are less likely to accept offers than those with a low SHAPS score (Fig. S4A). Due to the implemented staircasing procedure, group differences can also be seen in the effort-reward combinations offered per trial. While for both groups, the staircasing procedure seems to devolve towards high effort – low reward offers, this is more pronounced in the subgroup of participants with a lower SHAPS score (Fig S4B).”

      (3) None of the key effects relate to effort or reward sensitivity which is somewhat surprising given the previous literature and also means that it is hard to know if choice bias results would be equally found in tasks without any effort component. (The only analysis related to effort sensitivity is exploratory and in a subsample of N=56 per group looking at people meeting criteria for MDD vs matched controls.) Were stimuli constructed such that effort and reward sensitivity could be separated (i.e., are uncorrelated/orthogonal)? Maybe it would be worth looking at the % accepted in the largest or two largest effort value bins in an exploratory analysis. It seems the lowest and 2nd lowest effort level generally lead to accepting the challenge pretty much all the time, so including those effort levels might not be sensitive to individual difference analyses?

      We too were initially surprised by the lack of effect of neuropsychiatric symptoms on reward and effort sensitivity. To address the Reviewer’s first comment, the nature of the ‘choice bias’ parameter (now motivational tendency) is its critical importance in the context of effort-based decision-making: it is not modelled or measured explicitly in tasks without effort (such as typical reward tasks), so it would be impossible to test this in tasks without an effort component. 

      For the Reviewer’s second comment, the exploratory MDD analysis is not our only one related to effort sensitivity: the effort sensitivity parameter is included in all of our central analyses, and (like reward sensitivity), does not relate to our measured neuropsychiatric symptoms (e.g., see page 15). Note most previous effort tasks do not include a ‘choice bias’/motivational tendency parameter, potentially explaining this discrepancy. However, our model was quantitatively superior to models without this parameter, for example with only effort- and reward-sensitivity (page 11, Fig. 3).

      Our three model parameters (reward sensitivity, effort sensitivity, and choice bias/motivational tendency) were indeed uncorrelated/orthogonal to one another (see parameter orthogonality analyses below), making it unlikely that the variance and effect captured by our motivational tendency parameter (previously termed “choice bias”) should really be attributed to reward sensitivity. As per the Reviewer’s suggestion, we also examined whether the lowest two effort levels might not be sensitive to individual differences; in fact, we found out proportion of accepted trials on the lowest effort levels alone was nevertheless predicted by anhedonia (see ceiling effect analyses below).

      Specifically, in terms of parameter orthogonality:

      When developing our task design and computational modelling approach we were careful to ensure that meaningful neurocomputational parameters could be estimated and that no spurious correlations between parameters would be introduced by modelling. By conducting parameter recoveries for all models, we showed that our modelling approach could reliably estimate parameters, and that estimated parameters are orthogonal to the other underlying parameters (as can be seen in Figure S1 in the supplement). It is thus unlikely that the variance and effect captured by our motivational tendency parameter (previously termed “choice bias”) should really be attributed to reward sensitivity.

      And finally, regarding the possibility of a ceiling effect for low effort levels:

      We agree that visual inspection of the proportion of accepted results across effort and reward values can lead to the belief that a ceiling effect prevents the two lowest effort levels from capturing any inter-individual differences. To test whether this is the case, we ran a Bayesian GLM with the SHAPS sum score predicting the proportion of accepted trials (controlling for age and gender), in a subset of the data including only trials with an effort level of 1 or 2. We found the SHAPS has a predictive value for the proportion of accepted trials in the lowest two effort levels: M=-0.05; 95%HDI=[-0.07,-0.02]). This is noted in the text as follows.

      Supplement, lines 175 – 180:

      “The proportion of accepted trials averaged across effort and reward levels was predicted by the Snaith-Hamilton Pleasure Scale (SHAPS) sum scores (M=-0.07; 95%HDI=[-0.12,-0.03]) and the Apathy Evaluation Scale (AES) sum scores (M=-0.05; 95%HDI=[-0.10,-0.002]). Note that this was not driven only by higher effort levels; even confining data to the lowest two effort levels, SHAPS has a predictive value for the proportion of accepted trials: M=-0.05; 95%HDI=[-0.07,-0.02].”

      (4) The abstract and discussion seem overstated (implications for the school system and statements on circadian rhythms which were not measured here). They should be toned down to reflect conclusions supported by the data.

      We thank the Reviewer for pointing this out, and have now removed these claims from the abstract and Discussion; we hope they now better reflect conclusions supported by these data directly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Suggestions for improved or additional experiments, data or analyses.

      - For a non-computational audience, it would be useful to unpack the influence of the choice bias on behavior, as it is less clear how this would affect decision-making than sensitivity to effort or reward. Perhaps a figure showing accept/reject decisions when sensitivities are held and choice bias is high would be beneficial.

      We thank the Reviewer for suggesting additional explanations of the choice bias parameter to aid interpretation for non-computational readers; as per the Reviewer’s suggestion, we have now included additional explanations and visualisations (Figure 3) to make this as clear as possible. Please note also that, in response to one of the other Reviewers and after careful considerations, we have decided to rename the “choice bias” parameter to “motivational tendency”, hoping this will prove more intuitive.

      To aid with the understanding and interpretation of this and the other model parameters, we have added the following explanation to the main text.

      Lines 149 – 155:

      “The models posit efforts and rewards are joined into a subjective value (SV), weighed by individual effort (and reward sensitivity (parameters. The subjective value is then integrated with an individual motivational tendency (a) parameter to guide decision-making. Specifically, the motivational tendency parameter determines the range at which subjective values are translated to acceptance probabilities: the same subjective value will translate to a higher acceptance probability the higher the motivational tendency.”

      Additionally, we add the following explanation to the Methods section.

      Lines 698 – 709:

      First, a cost function transforms costs and rewards associated with an action into a subjective value (SV):

      with and for reward and effort sensitivity, and ℛ and 𝐸 for reward and effort. Higher effort and reward sensitivity mean the SV is more strongly influenced by changes in effort and reward, respectively (Fig. 3B-C). Hence, low effort and reward sensitivity mean the SV, and with that decision-making, is less guided by effort and reward offers, as would be in random decision-making.

      This SV is then transformed to an acceptance probability by a softmax function:

      with for the predicted acceptance probability and 𝛼 for the intercept representing motivational tendency. A high motivational tendency means a subjects has a tendency, or bias, to accept rather than reject offers (Fig. 3D).

      Our new figure (panels A-D in figure 3) visualizes the model. This demonstrates how the different model parameters come at play in the model (A), and how different values on each parameter affects the model (B-D).

      - The early and late chronotype groups have significant differences in ages and gender. Additional supplementary analysis here may mitigate any concerns from readers.

      The Reviewer is right to notice that our subsamples of early and late chronotypes differ significantly in age and gender, but it important to note that all our analyses comparing these two groups take this into account, statistically controlling for age and gender. We regret that this was previously only mentioned in the Methods section, so this information was not accessible where most relevant. To remedy this, we have amended the Results section as follows.

      Lines 317 – 323:

      “Bayesian GLMs, controlling for age and gender, predicting task parameters by time-of-day and chronotype showed effects of chronotype on reward sensitivity (i.e. those with a late chronotype had a higher reward sensitivity; M= 0.325, 95% HDI=[0.19,0.46]) and motivational tendency (higher in early chronotypes; M=-0.248, 95% HDI=[-0.37,-0.11]), as well as an interaction between chronotype and time-of-day on motivational tendency (M=0.309, 95% HDI=[0.15,0.48]).”

      (2) Recommendations for improving the writing and presentation.

      - I found the term 'overlapping' a little jarring. I think the authors use it to mean both neuropsychiatric symptoms and chronotypes affect task parameters, but they are are not tested to be 'separable', nor is an interaction tested. Perhaps being upfront about how interactions are not being tested here (in the introduction, and not waiting until the discussion) would give an opportunity to operationalize this term.

      We agree with the Reviewer that our previously-used term “overlapping” was not ideal: it may have been misleading, and was not necessarily reflective of the nature of our findings. We now state explicitly that we are not testing an interaction between neuropsychiatric symptoms and chronotypes in our primary analyses. Additionally, following suggestions made by Reviewer 3, we ran new exploratory analyses to investigate how the effects of neuropsychiatric symptoms and circadian measures on motivational tendency relate to one another. These results in fact show that all three symptom measures have separable effects from circadian measures on motivational tendency. This supports the Reviewer’s view that ‘overlapping’ was entirely the wrong word—although it nevertheless shows the important contribution of circadian rhythm as well as neuropsychiatric symptoms in effort-based decision-making. We have changed the manuscript throughout to better describe this important, more accurate interpretation of our findings, including replacing the term “overlapping”. We changed the title from “Overlapping effects of neuropsychiatric symptoms and circadian rhythm on effort-based decision-making” to “Both neuropsychiatric symptoms and circadian rhythm alter effort-based decision-making”.

      To clarify the intention of our primary analyses, we have added the following to the last paragraph of the introduction.

      Lines 107 – 112:

      “Next, we pre-registered a follow-up experiment to directly investigate how circadian preference interacts with time-of-day on motivational decision-making, using the same task and computational modelling approach. While this allows us to test how circadian effects on motivational decision-making compare to neuropsychiatric effects, we do not test for possible interactions between neuropsychiatric symptoms and chronobiology.”

      We detail our new analyses in the Methods section as follows.

      Lines 800 – 814:

      “4.5.2 Differentiating between the effects of neuropsychiatric symptoms and circadian measures on motivational tendency

      To investigate how the effects of neuropsychiatric symptoms on motivational tendency (2.3.1) relate to effects of chronotype and time-of-day on motivational tendency we conducted exploratory analyses. In the subsamples of participants with an early or late chronotype (including additionally collected data), we first ran Bayesian GLMs with neuropsychiatric questionnaire scores (SHAPS, DARS, AES respectively) predicting motivational tendency, controlling for age and gender. We next added an interaction term of chronotype and time-of-day into the GLMs, testing how this changes previously observed neuropsychiatric and circadian effects on motivational tendency. Finally, we conducted a model comparison using LOO, comparing between motivational tendency predicted by a neuropsychiatric questionnaire, motivational tendency predicted by chronotype and time-of-day, and motivational tendency predicted by a neuropsychiatric questionnaire and time-of-day (for each neuropsychiatric questionnaire, and controlling for age and gender).”

      Results of the outlined analyses are reported in the Results section as follows.

      Lines 356 – 383:

      “2.5.2.1 Neuropsychiatric symptoms and circadian measures have separable effects on motivational tendency

      Exploratory analyses testing for the effects of neuropsychiatric questionnaires on motivational tendency in the subsamples of early and late chronotypes confirmed the predictive value of the SHAPS (M=-0.24, 95% HDI=[-0.42,-0.06]), the DARS (M=-0.16, 95% HDI=[-0.31,-0.01]), and the AES (M=-0.18, 95% HDI=[-0.32,-0.02]) on motivational tendency.

      For the SHAPS, we find that when adding the measures of chronotype and time-of-day back into the GLMs, the main effect of the SHAPS (M=-0.26, 95% HDI=[-0.43,-0.07]), the main effect of chronotype (M=-0.11, 95% HDI=[-0.22,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remain. Model comparison by LOOIC reveals motivational tendency is best predicted by the model including the SHAPS, chronotype and time-of-day as predictors, followed by the model including only the SHAPS. Note that this approach to model comparison penalizes models for increasing complexity.

      Repeating these steps with the DARS, the main effect of the DARS is found numerically, but the 95% HDI just includes 0 (M=-0.15, 95% HDI=[-0.30,0.002]). The main effect of chronotype (M=-0.11, 95% HDI=[-0.21,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.18, 95% HDI=[0.05,0.33]) on motivational tendency remain. Model comparison identifies the model including the DARS and circadian measures as the best model, followed by the model including only the DARS.

      For the AES, the main effect of the AES is found (M=-0.19, 95% HDI=[-0.35,-0.04]). For the main effect of chronotype, the 95% narrowly includes 0 (M=-0.10, 95% HDI=[-0.21,0.002]), while the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remains. Model comparison identifies the model including the AES and circadian measures as the best model, followed by the model including only the AES.”

      In addition to the title change, we edited our Discussion to discuss and reflect these new insights, including the following.

      Lines 399 – 402:

      “Various neuropsychiatric disorders are marked by disruptions in circadian rhythm, such as a late chronotype. However, research has rarely investigated how transdiagnostic mechanisms underlying neuropsychiatric conditions may relate to inter-individual differences in circadian rhythm.”

      Lines 475 – 480:

      “It is striking that the effects of neuropsychiatric symptoms on effort-based decision-making largely are paralleled by circadian effects on the same neurocomputational parameter. Exploratory analyses predicting motivational tendency by neuropsychiatric symptoms and circadian measures simultaneously indicate the effects go beyond recapitulating each other, but rather explain separable parts of the variance in motivational tendency.”

      Lines 528 – 532:

      “Our reported analyses investigating neuropsychiatric and circadian effects on effort-based decision-making simultaneously are exploratory, as our study design was not ideally set out to examine this. Further work is needed to disentangle separable effects of neuropsychiatric and circadian measures on effort-based decision-making.”

      Lines 543 – 550:

      “We demonstrate that neuropsychiatric effects on effort-based decision-making are paralleled by effects of circadian rhythm and time-of-day. Exploratory analyses suggest these effects account for separable parts of the variance in effort-based decision-making. It unlikely that effects of neuropsychiatric effects on effort-based decision-making reported here and in previous literature are a spurious result due to multicollinearity with chronotype. Yet, not accounting for chronotype and time of testing, which is the predominant practice in the field, could affect results.”

      - A minor point, but it could be made clearer that many neurotransmitters have circadian rhythms (and not just dopamine).

      We agree this should have been made clearer, and have added the following to the Introduction.

      Lines 83 – 84:

      “Bi-directional links between chronobiology and several neurotransmitter systems have been reported, including dopamine47.

      (47) Kiehn, J.-T., Faltraco, F., Palm, D., Thome, J. & Oster, H. Circadian Clocks in the Regulation of Neurotransmitter Systems. Pharmacopsychiatry 56, 108–117 (2023).”

      - Making reference to other studies which have explored circadian rhythms in cognitive tasks would allow interested readers to explore the broader field. One such paper is: Bedder, R. L., Vaghi, M. M., Dolan, R. J., & Rutledge, R. B. (2023). Risk taking for potential losses but not gains increases with time of day. Scientific reports, 13(1), 5534, which also includes references to other similar studies in the discussion.

      We thank the Reviewer for pointing out that we failed to cite this relevant work. We have now included it in the Introduction as follows.

      Lines 97 – 98:

      “A circadian effect on decision-making under risk is reported, with the sensitivity to losses decreasing with time-of-day66.

      (66) Bedder, R. L., Vaghi, M. M., Dolan, R. J. & Rutledge, R. B. Risk taking for potential losses but not gains increases with time of day. Sci Rep 13, 5534 (2023).”

      (3) Minor corrections to the text and figures.

      None, clearly written and structured. Figures are high quality and significantly aid understanding.

      Reviewer #2 (Recommendations For The Authors):

      I did have a few more minor comments:

      - The manuscript doesn't clarify whether trials had time limits - so that participants might fail to earn points - or instead they did not and participants had to continue exerting effort until they were done. This is important to know since it impacts on decision-strategies and behavioral outcomes that might be analyzed. For example, if there is no time limit, it might be useful to examine the amount of time it took participants to complete their effort - and whether that had any relationship to choice patterns or symptomatology. Or, if they did, it might be interesting to test whether the relationship between choices and exerted effort depended on symptoms. For example, someone with depression might be less willing to choose effort, but just as, if not more likely to successfully complete a trial once it is selected.

      We thank the Reviewer for pointing out this important detail in the task design, which we should have made clearer. The trials did indeed have a time limit which was dependent on the effort level. To clarify this in the manuscript, we have made changes to Figure 2 and the Methods section. We agree it would be interesting to explore whether the exerted effort in the task related to symptoms. We explored this in our data by predicting the participant average proportion of accepted but failed trials by SHAPS score (controlling for age and gender). We found no relationship: M=0.01, 95% HDI=[-0.001,0.02]. However, it should be noted that the measure of proportion of failed trials may not be suitable here, as there are only few accepted but failed trials (M = 1.3% trials failed, SD = 3.50). This results from several task design characteristics aimed at preventing subjects from failing accepted trials, to avoid confounding of effort discounting with risk discounting. As an alternative measure, we explored the extent to which participants went “above and beyond” the target in accepted trials. Specifically, considering only accepted and succeeded trials, we computed the factor by which the required number of clicks was exceeded (i.e., if a subject clicked 15 times when 10 clicks were required the factor would be 1.3), averaging across effort and reward level. We then conducted a Bayesian GLM to test whether this subject wise click-exceedance measure can be predicted by apathy or anhedonia, controlling for age and gender. We found neither the SHAPS (M=-0.14, 95% HDI=[-0.43,0.17]) nor the AES (M=0.07, 95% HDI=[-0.26,0.41]) had a predictive value for the amount to which subjects exert “extra effort”. We have now added this to the manuscript.

      In Figure 2, which explains the task design in the results section, we have added the following to the figure description.

      Lines 161 – 165:

      “Each trial consists of an offer with a reward (2,3,4, or 5 points) and an effort level (1,2,3, or 4, scaled to the required clicking speed and time the clicking must be sustained for) that subjects accept or reject. If accepted, a challenge at the respective effort level must be fulfilled for the required time to win the points.”

      In the Methods section, we have added the following.

      Lines 617 – 622:

      “We used four effort-levels, corresponding to a clicking speed at 30% of a participant’s maximal capacity for 8 seconds (level 1), 50% for 11 seconds (level 2), 70% for 14 seconds (level 3), and 90% for 17 seconds (level 4). Therefore, in each trial, participants had to fulfil a certain number of mouse clicks (dependent on their capacity and the effort level) in a specific time (dependent on the effort level).”

      In the Supplement, we have added the additional analyses suggested by the Reviewer.

      Lines 195 – 213:

      “3.2 Proportion of accepted but failed trials

      For each participant, we computed the proportion of trial in which an offer was accepted, but the required effort then not fulfilled (i.e., failed trials). There was no relationship between average proportion of accepted but failed trials and SHAPS score (controlling for age and gender): M=0.01, 95% HDI=[-0.001,0.02]. However, there are intentionally few accepted but failed trials (M = 1.3% trials failed, SD = 3.50). This results from several task design characteristics aimed at preventing subjects from failing accepted trials, to avoid confounding of effort discounting with risk discounting.”

      “3.3 Exertion of “extra effort”

      We also explored the extent to which participants went “above and beyond” the target in accepted trials. Specifically, considering only accepted and succeeded trials, we computed the factor by which the required number of clicks was exceeded (i.e., if a subject clicked 15 times when 10 clicks were required the factor would be 1.3), averaging across effort and reward level. We then conducted a Bayesian GLM to test whether this subject wise click-exceedance measure can be predicted by apathy or anhedonia, controlling for age and gender. We found neither the SHAPS (M=-0.14, 95% HDI=[-0.43,0.17]) nor the AES (M=0.07, 95% HDI=[-0.26,0.41]) had a predictive value for the amount to which subjects exert “extra effort”.”

      - Perhaps relatedly, there is evidence that people with depression show less of an optimism bias in their predictions about future outcomes. As such, they show more "rational" choices in probabilistic decision tasks. I'm curious whether the Authors think that a weaker choice bias among those with stronger depression/anhedonia/apathy might be related. Also, are choices better matched with actual effort production among those with depression?

      We think this is a very interesting comment, but unfortunately feel our manuscript cannot properly speak to it: as in our response to the previous comment, our exploratory analysis linking the proportion of accepted but failed trials to anhedonia symptoms (i.e. less anhedonic people making more optimistic judgments of their likelihood of success) did not show a relationship between the two. However, this null finding may be the result of our task design which is not laid out to capture such an effect (in fact to minimize trials of this nature). We have added to the Discussion section.

      Lines 442 – 445:

      “It is possible that a higher motivational tendency reflects a more optimistic assessment of future task success, in line with work on the optimism bias95; however our task intentionally minimized unsuccessful trials by titrating effort and reward; future studies should explore this more directly.

      (95) Korn, C. W., Sharot, T., Walter, H., Heekeren, H. R. & Dolan, R. J. Depression is related to an absence of optimistically biased belief updating about future life events. Psychological Medicine 44, 579–592 (2014).”

      - The manuscript does not clarify: How did the Authors ensure that each subject received each effort-reward combination at least once if a given subject always accepted or always rejected offers?

      We have made the following edit to the Methods section to better explain this aspect of our task design.

      Lines 642 – 655:

      “For each subject, trial-by-trial presentation of effort-reward combinations were made semi-adaptively by 16 randomly interleaved staircases. Each of the 16 possible offers (4 effort-levels x 4 reward-levels) served as the starting point of one of the 16 staircase. Within each staircase, after a subject accepted a challenge, the next trial’s offer on that staircase was adjusted (by increasing effort or decreasing reward). After a subject rejected a challenge, the next offer on that staircase was adjusted by decreasing effort or increasing reward. This ensured subjects received each effort-reward combination at least once (as each participant completed all 16 staircases), while individualizing trial presentation to maximize the trials’ informative value. Therefore, in practice, even in the case of a subject rejecing all offers (and hence the staircasing procedures always adapting by decreasing effort or increasing reward), the full range of effort-reward combinations will be represented in the task across the startingpoints of all staircases (and therefore before adaption takeplace).”

      - The word "metabolic" is misspelled in Table 1

      - Figure 2 is missing panel label "C"

      - The word "effort" is repeated on line 448.

      We thank the Reviewer for their attentive reading of our manuscript and have corrected the mistakes mentioned.

      Reviewer #3 (Recommendations For The Authors):

      It is a bit difficult to get a sense of people's discounting from the plots provided. Could the authors show a few example individuals and their fits (i.e., how steep was effort discounting on average and how much variance was there across individuals; maybe they could show the mean discount function or some examples etc)

      We appreciate very much the Reviewer's suggestion to visualise our parameter estimates within and across individuals. We have implemented this in Figure .S2

      It would be helpful if correlations between the various markers used as dependent variables (SHAPS, DARS, AES, chronotype etc) could plotted as part of each related figure (e.g., next to the relevant effects shown).

      We agree with the Reviewer that a visual representation of the various correlations between dependent variables would be a better and more assessable communication than our current paragraph listing the correlations. We have implemented this by adding a new figure plotting all correlations in a heat map, with asterisks indicating significance.

      The authors use the term "meaningful relationship" - how is this defined? If undefined, maybe consider changing (do they mean significant?)

      We understand how our use of the term “(no) meaningful relationship” was confusing here. As we conducted most analyses in a Bayesian fashion, this is a formal definition of ‘meaningful’: the 95% highest density interval does not span across 0. However, we do not want this to be misunderstood as frequentist “significance” and agree clarity can be improved here, To avoid confusion, we have amended the manuscript where relevant (i.e., we now state “we found a (/no) relationship / effect” rather than “we found a meaningful relationship”.

      The authors do not include an inverse temperature parameter in their discounting models-can they motivate why? If a participant chose nearly randomly, which set of parameter values would they get assigned?

      Our decision to not include an inverse temperature parameter was made after an extensive simulation-based investigation of different models and task designs. A series of parameter recovery studies including models with an inverse temperature parameter revealed the inverse temperature parameter could not be distinguished from the reward sensitivity parameter. Specifically, inverse temperature seemed to capture the variance of the true underlying reward sensitivity parameter, leading to confounding between the two. Hence, including both reward sensitivity and inverse temperature would not have allowed us to reliably estimate either parameter. As our pre-registered hypotheses related to the reward sensitivity parameter, we opted to include models with the reward sensitivity parameter rather than the inverse temperature parameter in our model space. We have now added these simulations to our supplement.

      Nevertheless, we believe our models can capture random decision-making. The parameters of effort and reward sensitivity capture how sensitive one is to changes in effort/reward level. Hence, random decision-making can be interpreted as low effort and reward sensitivity, such that one’s decision-making is not guided by changes in effort and reward magnitude. With low effort/reward sensitivity, the motivational tendency parameter (previously “choice bias”) would capture to what extend this random decision-making is biased toward accepting or rejecting offers.

      The simulation results are now detailed in the Supplement.

      Lines 25 – 46:

      “1.2.1 Parameter recoveries including inverse temperature

      In the process of task and model space development, we also considered models incorportating an inverse temperature paramater. To this end, we conducted parameter recoveries for four models, defined in Table S3.

      Parameter recoveries indicated that, parameters can be recovered reliably in model 1, which includes only effort sensitivity ( ) and inverse temperature as free parameters (on-diagonal correlations: .98 > r > .89, off-diagonal correlations: .04 > |r| > .004). However, as a reward sensitivity parameter is added to the model (model 2), parameter recovery seems to be compromised, as parameters are estimated less accurately (on-diagonal correlations: .80 > r > .68), and spurious correlations between parameters emerge (off-diagonal correlations: .40 > |r| > .17). This issue remains when motivational tendency is added to the model (model 4; on-diagonal correlations: .90 > r > .65; off-diagonal correlations: .28 > |r| > .03), but not when inverse temperature is modelled with effort sensitivity and motivational tendency, but not reward sensitivity (model 3; on-diagonal correlations: .96 > r > .73; off-diagonal correlations: .05 > |r| > .003).

      As our pre-registered hypotheses related to the reward sensitivity parameter, we opted to include models with the reward sensitivity parameter rather than the inverse temperature parameter in our model space.”

      And we now discuss random decision-making specifically in the Methods section.

      Lines 698 – 709:

      “First, a cost function transforms costs and rewards associated with an action into a subjective value (SV):

      with and for reward and effort sensitivity, and  and  for reward and effort. Higher effort and reward sensitivity mean the SV is more strongly influenced by changes in effort and reward, respectively (Fig. 3B-C). Hence, low effort and reward sensitivity mean the SV, and with that decision-making, is less guided by effort and reward offers, as would be in random decision-making.

      This SV is then transformed to an acceptance probability by a softmax function:

      with for the predicted acceptance probability and  for the intercept representing motivational tendency. A high motivational tendency means a subjects has a tendency, or bias, to accept rather than reject offers (Fig. 3D).”

      The pre-registration mentions effects of BMI and risk of metabolic disease-those are briefly reported the in factor loadings, but not discussed afterwards-although the authors stated hypotheses regarding these measures in their preregistration. Were those hypotheses supported?

      We reported these results (albeit only briefly) in the factor loadings resulting from our PLS regression and results from follow-up GLMs (see below). We have now amended the Discussion to enable further elaboration on whether they confirmed our hypotheses (this evidence was unclear, but we have subsequently followed up in a sample with type-2 diabetes, who also show reduced motivational tendency).

      Lines 258 – 261:

      “For the MEQ (95%HDI=[-0.09,0.06]), MCTQ (95%HDI=[-0.17,0.05]), BMI (95%HDI=[-0.19,0.01]), and FINDRISC (95%HDI=[-0.09,0.03]) no relationship with motivational tendency was found, consistent with the smaller magnitude of reported component loadings from the PLS regression.”

      We have added the following paragraph to our discussion.

      Lines 491 – 502:

      “To our surprise, we did not find statistical evidence for a relationship between effort-based decision-making and measures of metabolic health (BMI and risk for type-2 diabetes). Our analyses linking BMI to motivational tendency reveal a numeric effect in line with our hypothesis: a higher BMI relating to a lower motivational tendency. However, the 95% HDI for this effect narrowly included zero (95%HDI=[-0.19,0.01]). Possibly, our sample did not have sufficient variance in metabolic health to detect dimensional metabolic effects in a current general population sample. A recent study by our group investigates the same neurocomputational parameters of effort-based decision-making in participants with type-2 diabetes and non-diabetic controls matched by age, gender, and physical activity105. We report a group effect on the motivational tendency parameter, with type-2 diabetic patients showing a lower tendency to exert effort for reward.”

      “(105) Mehrhof, S. Z., Fleming, H. A. & Nord, C. A cognitive signature of metabolic health in effort-based decision-making. Preprint at https://doi.org/10.31234/osf.io/4bkm9 (2024).”

      R-values are indicated as a range (e.g., from 0.07-0.72 for the last one in 2.1 which is a large range). As mentioned above, the full correlation matrix should be reported in figures as heatmaps.

      We agree with the Reviewer that a heatmap is a better way of conveying this information – see Figure 1 in response to their previous comment.  

      The answer on whether data was already collected is missing on the second preregistration link. Maybe this is worth commenting on somewhere in the manuscript.

      This question appears missing because, as detailed in the manuscript, we felt that technically some data *was* already collected by the time our second pre-registration was posted. This is because the second pre-registration detailed an additional data collection, with the goal of extending data from the original dataset to include extreme chronotypes and increase precision of analyses. To avoid any confusion regarding the lack of reply to this question in the pre-registration, we have added the following disclaimer to the description of the second pre-registration:

      “Please note the lack of response to the question regarding already collected data. This is because the data collection in the current pre-registration extends data from the original dataset to increase the precision of analyses. While this original data is already collected, none of the data collection described here has taken place.”

      Some referencing is not reflective of the current state of the field (e.g., for effort discounting: Sugiwaka et al., 2004 is cited). There are multiple labs that have published on this since then including Philippe Tobler's and Sven Bestmann's groups (e.g., Hartmann et al., 2013; Klein-Flügge et al., Plos CB, 2015).

      We agree absolutely, and have added additional, more recent references on effort discounting.

      Lines 67 – 68:

      “Higher costs devalue associated rewards, an effect referred to as effort-discounting33–37.”

      (33) Sugiwaka, H. & Okouchi, H. Reformative self-control and discounting of reward value by delay or effort1. Japanese Psychological Research 46, 1–9 (2004).

      (34) Hartmann, M. N., Hager, O. M., Tobler, P. N. & Kaiser, S. Parabolic discounting of monetary rewards by physical effort. Behavioural Processes 100, 192–196 (2013).

      (35) Klein-Flügge, M. C., Kennerley, S. W., Saraiva, A. C., Penny, W. D. & Bestmann, S. Behavioral Modeling of Human Choices Reveals Dissociable Effects of Physical Effort and Temporal Delay on Reward Devaluation. PLOS Computational Biology 11, e1004116 (2015).

      (36) Białaszek, W., Marcowski, P. & Ostaszewski, P. Physical and cognitive effort discounting across different reward magnitudes: Tests of discounting models. PLOS ONE 12, e0182353 (2017).

      (37) Ostaszewski, P., Bąbel, P. & Swebodziński, B. Physical and cognitive effort discounting of hypothetical monetary rewards. Japanese Psychological Research 55, 329–337 (2013).

      There are lots of typos throughout (e.g., Supplementary martial, Mornignness etc)

      We thank the Reviewer for their attentive reading of our manuscript and have corrected our mistakes.

      In Table 1, it is not clear what the numbers given in parentheses are. The figure note mentions SD, IQR, and those are explicitly specified for some rows, but not all.

      After reviewing Table 1 we understand the comment regarding the clarity of the number in parentheses. In our original manuscript, for some variables, numbers were given per category (e.g. for gender and ethnicity), rather than per row, in which case the parenthetical statistic was indicated in the header row only. However, we now see that the clarity of the table would have been improved by adding the reported statistic for each row—we have corrected this.

      In Figure 1C, it would be much more helpful if the different panels were combined into one single panel (using differently coloured dots/lines instead of bars).

      We agree visualizing the proportion of accepted trials across effort and reward levels in one single panel aids interpretability. We have implemented it in the following plot (now Figure 2C).

      In Sections 2.2.1 and 4.2.1, the authors mention "mixed-effects analysis of variance (ANOVA) of repeated measures" (same in the preregistration). It is not clear if this is a standard RM-ANOVA (aggregating data per participant per condition) or a mixed-effects model (analysing data on a trial-by-trial level). This model seems to only include within-subjects variable, so it isn't a "mixed ANOVA" mixing within and between subjects effects.

      We apologise that our use of the term "mixed-effects analysis of variance (ANOVA) of repeated measures" is indeed incorrectly applied here. We aggregate data per participant and effort-by-reward combination, meaning there are no between-subject effects tested. We have corrected this to “repeated measures ANOVA”.

      In Section 2.2.2, the authors write "R-hats>1.002" but probably mean "R-hats < 1.002". ESS is hard to evaluate unless the total number of samples is given.

      We thank the Reviewer for noticing this mistake and have corrected it in the manuscript.

      In Section 2.3, the inference criterion is unclear. The authors first report "factor loadings" and then perform a permutation test that is not further explained. Which of these factors are actually needed for predicting choice bias out of chance? The permutation test suggests that the null hypothesis is just "none of these measures contributes anything to predicting choice bias", which is already falsified if only one of them shows an association with choice bias. It would be relevant to know for which measures this is the case. Specifically, it would be relevant to know whether adding circadian measures into a model that already contains apathy/anhedonia improves predictive performance.

      We understand the Reviewer’s concerns regarding the detail of explanation we have provided for this part of our analysis, but we believe there may have been a misunderstanding regarding the partial least squares (PLS) regression. Rather than identifying a number of factors to predict the outcome variable, a PLS regression identifies a model with one or multiple components, with various factor loadings of differing magnitude. In our case, the PLS regression identified a model with one component to best predict our outcome variable (motivational tendency, which in our previous various we called choice bias). This one component had factor loadings of our questionnaire-based measures, with measures of apathy and anhedonia having highest weights, followed by lesser weighted factor loadings by measures of circadian rhythm and metabolic health. The permutation test tests whether this component (consisting of the combination of factor loadings) can predict the outcome variable out of sample.

      We hope we have improved clarity on this in the manuscript by making the following edits to the Results section.

      Lines 248 – 251:

      “Permutation testing indicated the predictive value of the resulting component (with factor loadings described above) was significant out-of-sample (root-mean-squared error [RMSE]=0.203, p=.001).”

      Further, we hope to provide a more in-depth explanation of these results in the Methods section.

      Lines 755 – 759:

      “Statistical significance of obtained effects (i.e., the predictive accuracy of the identified component and factor loadings) was assessed by permutation tests, probing the proportion of root-mean-squared errors (RMSEs) indicating stronger or equally strong predictive accuracy under the null hypothesis.”

      In Section 2.5, the authors simply report "that chronotype showed effects of chronotype on reward sensitivity", but the direction of the effect (higher reward sensitivity in early vs. late chronotype) remains unclear.

      We thank the Reviewer for pointing this out. While we did report the direction of effect, this was only presented in the subsequent parentheticals and could have been made much clearer. To assist with this, we have made the following addition to the text.

      Lines 317 – 320:

      “Bayesian GLMs, controlling for age and gender, predicting task parameters by time-of-day and chronotype showed effects of chronotype on reward sensitivity (i.e. those with a late chronotype had a higher reward sensitivity; M= 0.325, 95% HDI=[0.19,0.46])”

      In Section 4.2, the authors write that they "implemented a previously-described procedure using Prolific pre-screeners", but no reference to this previous description is given.

      We thank the Reviewer for bringing our attention to this missing reference, which has now been added to the manuscript.

      In Supplementary Table S2, only the "on-diagonal correlations" are given, but off-diagonal correlations (indicative of trade-offs between parameters) would also be informative.

      We agree with the Reviewer that off-diagonal correlations between underlying and recovered parameters are crucial to assess confounding between parameters during model estimation. We reported this in figure S1D, where we present the full correlation matric between underlying and recovered parameters in a heatmap. We have now noticed that this plot was missing axis labels, which have been added now.

      I found it somewhat difficult to follow the results section without having read the methods section beforehand. At the beginning of the Results section, could the authors briefly sketch the outline of their study? Also, given they have a pre-registration, could the authors introduce each section with a statement of what they expected to find, and close with whether the data confirmed their expectations? In the current version of the manuscript, many results are presented without much context of what they mean.

      We agree a brief outline of the study procedure before reporting the results would be beneficial to following the subsequently text and have added the following to the end of our Introduction.

      Lines 101 – 106:

      “Here, we tested the relationship between motivational decision-making and three key neuropsychiatric syndromes: anhedonia, apathy, and depression, taking both a transdiagnostic and categorical (diagnostic) approach. To do this, we validate a newly developed effort-expenditure task, designed for online testing, and gamified to increase engagement. Participants completed the effort-expenditure task online, followed by a series of self-report questionnaires.”

      We have added references to our pre-registered hypotheses at multiple points in our manuscript.

      Lines 185 – 187:

      “In line with our pre-registered hypotheses, we found significant main effects for effort (F(1,14367)=4961.07, p<.0001) and reward (F(1,14367)=3037.91, p<.001), and a significant interaction between the two (F(1,14367)=1703.24, p<.001).”

      Lines 215 – 221:

      “Model comparison by out-of-sample predictive accuracy identified the model implementing three parameters (motivational tendency a, reward sensitivity , and effort sensitivity ), with a parabolic cost function (subsequently referred to as the full parabolic model) as the winning model (leave-one-out information criterion [LOOIC; lower is better] = 29734.8; expected log posterior density [ELPD; higher is better] = -14867.4; Fig. 31ED). This was in line with our pre-registered hypotheses.”

      Lines 252 – 258:

      “Bayesian GLMs confirmed evidence for psychiatric questionnaire measures predicting motivational tendency (SHAPS: M=-0.109; 95% highest density interval (HDI)=[-0.17,-0.04]; AES: M=-0.096; 95%HDI=[-0.15,-0.03]; DARS: M=-0.061; 95%HDI=[-0.13,-0.01]; Fig. 4A). Post-hoc GLMs on DARS sub-scales showed an effect for the sensory subscale (M=-0.050; 95%HDI=[-0.10,-0.01]). This result of neuropsychiatric symptoms predicting a lower motivational tendency is in line with our pre-registered hypothesis.”

      Lines 258 – 263:

      “For the MEQ (95%HDI=[-0.09,0.06]), MCTQ (95%HDI=[-0.17,0.05]), BMI (95%HDI=[-0.19,0.01]), and FINDRISC (95%HDI=[-0.09,0.03]) no meaningful relationship with choice biasmotivational tendency was found, consistent with the smaller magnitude of reported component loadings from the PLS regression. This null finding for dimensional measures of circadian rhythm and metabolic health was not in line with our pre-registered hypotheses.”

      Lines 268 – 270:

      “For reward sensitivity, the intercept-only model outperformed models incorporating questionnaire predictors based on RMSE. This result was not in line with our pre-registered expectations.”

      Lines 295 – 298:

      “As in our transdiagnostic analyses of continuous neuropsychiatric measures (Results 2.3), we found evidence for a lower motivational tendency parameter in the MDD group compared to HCs (M=-0.111, 95% HDI=[ -0.20,-0.03]) (Fig. 4B). This result confirmed our pre-registered hypothesis.”

      Lines 344 – 355:

      “Late chronotypes showed a lower motivational tendency than early chronotypes (M=-0.11, 95% HDI=[-0.22,-0.02])—comparable to effects of transdiagnostic measures of apathy and anhedonia, as well as diagnostic criteria for depression. Crucially, we found motivational tendency was modulated by an interaction between chronotype and time-of-day (M=0.19, 95% HDI=[0.05,0.33]): post-hoc GLMs in each chronotype group showed this was driven by a time-of-day effect within late, rather than early, chronotype participants (M=0.12, 95% HDI=[0.02,0.22], such that late chronotype participants showed a lower motivational tendency in the morning testing sessions, and a higher motivational tendency in the evening testing sessions; early chronotype: 95% HDI=[-0.16,0.04]) (Fig. 5A). These results of a main effect and an interaction effect of chronotype on motivational tendency confirmed our pre-registered hypothesis.”

      Lines 390 – 393:

      “Participants with an early chronotype had a lower reward sensitivity parameter than those with a late chronotype (M=0.27, 95% HDI=[0.16,0.38]). We found no effect of time-of-day on reward sensitivity (95%HDI=[-0.09,0.11]) (Fig. 5B). These results were in line with our pre-registered hypotheses.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths: 

      Overall the work is novel and moves the field of Alzheimer's disease forward in a significant way. The manuscript reports a novel concept of aberrant activity in VIP interneurons during the early stages of AD thus contributing to dysfunctions of the CA1 microcircuit. This results in the enhancement of the inhibitory tone on the primary cells of CA1. Thus, the disinhibition by VIP interneurons of Principal Cells is dampened. The manuscript was skillfully composed, and the study was of strong scientific rigor featuring well-designed experiments. Necessary controls were present. Both sexes were included.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Limitations:

      (1) The authors attributed aberrant circuit activity to the accumulation of "Abeta intracellularly" inside IS-3 cells. That is problematic. 6E10 antibody recognizes amyloid plaques in addition to Amyloid Precursor Protein (APP) as well as the C99 fragment. There are no plaques at the ages 3xTg mice were examined. Thus, the staining shown in Figure 1a is of APP/C99 inside neurons, not abeta accumulations in neurons. At the ages of 3-6 months, 3xTg starts producing abeta oligomers and potentially tau oligomers as well (Takeda et al., 2013 PMID: 23640054; Takeda et al., 2015 PMID: 26458742 and others). Emerging literature suggests that abeta and tau oligomers disrupt circuit function. Thus, a more likely explanation of abeta and tau oligomers disrupting the activity of VIP neurons is plausible.

      The Reviewer correctly points out that 3xTg-AD mice typically do not exhibit plaques before 6 months of age, with limited amounts even up to 12 months, particularly in the hippocampus. To the best of our knowledge, the 6E10 antibody binds to an epitope in APP (682-687) that is also present in the Abeta (3-8) peptide. Consequently, 6E10 detects full-length APP, α-APP (soluble alpha-secretase-cleaved APP), and Abeta (LaFerla et al., 2007). Nonetheless, we concur with the Reviewer's observation that the detected signal includes Abeta oligomers and the C99 fragment, which is currently considered an early marker of AD pathology (Takasugi et al., 2023; Tanuma et al., 2023). Studies have demonstrated intracellular accumulation of C99 in 3-month-old 3xTg mice (Lauritzen et al., 2012), and its binding to the Kv7 potassium channel family, which results in inhibiting their activity (Manville and Abbott, 2021). If a similar mechanism operates in IS-3 cells, it could explain the changes in their firing properties observed in our study. Consequently, we have revised the manuscript to include this crucial information in both the Results and Discussion sections.

      (2) Authors suggest that their animals do not exhibit loss of synaptic connections and show Figure 3d in support of that suggestion. However, imaging with confocal microscopy of 70micron thick sections would not allow the resolution of pre- and post-synaptic terminals. More sensitive measures such as electron microscopy or array tomography are the appropriate techniques to pursue. It is important for the authors to either remove that data from the manuscript or address the limitations of their technique in the discussion section. There is a possibility of loss of synaptic connections in their mouse model at the ages examined.

      We appreciate the Reviewer’s perspective on the techniques used for imaging synaptic connections. While we acknowledge the limitations of confocal microscopy for resolving pre- and post-synaptic structures in thick sections, we respectfully disagree regarding the exclusive suitability of electron microscopy (EM). Our approach involved confocal 3D image acquisition using a 63x objective at 0.2 um lateral resolution and 0.25 Z-step, providing valuable quantitative insights into synaptic bouton density. Despite the challenges posed by thick sections, this method together with automatic analysis allows for careful quantification. Although EM offers unparalleled resolution, it presents challenges in quantification. We have included the important details regarding image acquisition and analysis in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The submitted manuscript by Michaud and Francavilla et al., is a very interesting study describing early disruptions in the disinhibitory modulation exerted by VIP+ interneurons in CA1, in a triple transgenic model of Alzheimer's disease. They provide a comprehensive analysis at the cellular, synaptic, network, and behavioral level on how these changes correlate and might be related to behavioral impairments during these early stages of the disease.

      Main findings:

      - 3xTg mice show early Aß accumulation in VIP-positive interneurons.

      - 3xTg mice show deficits in a spatially modified version of the novel object recognition test. - 3xTg mice VIP cells present slower action potentials and diminished firing frequency upon current injection.

      - 3xTg mice show diminished spontaneous IPSC frequency with slower kinetics in Oriens / Alveus interneurons.

      - 3xTg mice show increased O/A interneuron activity during specific behavioral conditions. - 3xTg mice show decreased pyramidal cell activity during specific behavioral conditions.

      Strengths:

      This study is very important for understanding the pathophysiology of Alzheimer´s disease and the crucial role of interneurons in the hippocampus in healthy and pathological conditions.

      We are thankful to the reviewer for their insightful recognition of our efforts and their enthusiasm for the results of this research.

      Weaknesses:

      Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality.

      We completely agree with the reviewer's observation regarding the lack of demonstration of causality in our results. Investigating causality in the relationship between deficits in VIP physiological properties and differences in network activity is indeed a crucial aspect of this project. However, achieving this goal will require a significant amount of time and dedicated manipulations in a new mouse model (VIP-Cre-3xTg). We appreciate the importance of this line of investigation and consider it as a priority for our future research endeavors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Limitations:

      (1) The authors should describe their model and state the age at which these mice start depositing amyloid plaques and neurofibrillary tangles. Readers might not be familiar with this model. It is also important to mention that circuit disruptions are assessed prior to plaque and tangle formation.

      We have included a detailed description of the 3xTg-AD mouse model in the Introduction section, including information on the age at which amyloid plaques and neurofibrillary tangles begin to appear. Additionally, we have clarified that circuit disruptions were assessed before the formation of plaques and tangles. These details have been added to both the Introduction and the Results sections to ensure clarity for readers unfamiliar with the model.

      (2) Ns are presented in Supplemental Table 1. Units are presented in a note to Supplementary Table 1. It would be advisable to specify Ns and units as the data is being presented in the results section or figure legends for easy access.

      We have now included the Ns (sample sizes), specifying the number of cells or sections and the number of experimental animals, directly within the Results section and in the figure legends. This ensures that readers have immediate access to this information without needing to refer to the supplementary materials.

      (3) Several typos require correction:

      a. "mamory" - Line 22, page 5.

      b. The term "Interneurons" is abbreviated as both "INs" and "IN" throughout the manuscript. The author should consistently choose one abbreviation.

      We have corrected the typo "mamory" to "memory" on line 22, page 5. Additionally, we have standardized the abbreviation for "Interneurons" to "INs" throughout the manuscript for consistency.

      (4) Note 2 in Supplementary Table 1 states that animals of both sexes with equal distribution were used throughout the study. It would be best for the reader to assess the data distribution based on sex. Thus, it is advisable for the authors to depict male and female data points as distinct symbols throughout the figures.

      Unfortunately, we do not have detailed sex-disaggregated data for all datasets, which limits our ability to depict male and female data points separately across all figures. Therefore, we have opted to pool data from both sexes for a more comprehensive analysis. We believe this approach maintains the robustness of our findings.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      - To keep the logical line of reasoning and to be able to interpret the results, it would be important to use the same metrics when comparing the population activity of O/A interneurons and principal cells in the different behavioral conditions.

      We have revised Figures 4 and 5 to enhance the coherence in data presentation. This includes using consistent metrics for comparing the population activity of both O/A interneurons and principal cells across different behavioral conditions. These changes ensure a clearer and more logical interpretation of the results.

      - Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality. Would it be possible to test if manipulating VIP neurons one could obtain such specific results? Alternatively, it could be discussed more in detail how the decrease in disinhibition could lead to the changes in network activity demonstrated here.

      We agree with the reviewer that establishing causality between VIP neuron deficits and changes in network activity would be very important. However, demonstrating causality would require a new line of investigation, involving the use of specific mouse models to selectively manipulate VIP neurons. This is an exciting direction that we plan to prioritize in our future research. For this study, we have included a discussion on the potential mechanisms by which decreased disinhibition might lead to the observed changes in network activity. Specifically, we propose that in young adult 3xTg-AD mice, the altered firing of I-S3 cells may lead to enhanced inhibition of principal cells. This could shift the excitation/inhibition balance, input integration and firing output of principal cells thereby impacting overall network activity. These points are discussed in detail in the revised Discussion section.

      - On the same lines the correlations showed in the manuscript, would be more robust if there was an in vivo demonstration that 3xTg mice indeed show decreased activity in vivo. The same experiments could also clarify if VIP cells in control animals are more active at the time of decision-making and during object exploration as suggested in the manuscript.

      Thank you for your comment. In response to the point raised, we would like to highlight that we have recently documented the increased activity of VIP-INs in the D-zone of the T-maze and during object exploration in a study published in Cell Reports (Tamboli et al., 2024). This publication is now referenced in our manuscript to support our findings. Regarding the in vivo activity of 3xTg mice, our observations indicated no significant differences in major behavioral patterns such as locomotion, rearing, and exploration of the T-maze when comparing Tg and non-Tg mice. These findings are presented in detail in Figure 4c and Supplementary Fig. 5. We believe these data support the robustness of our correlations by demonstrating that the overall behavioral activity of 3xTg mice is comparable to that of non-transgenic controls, thus focusing attention on the specific roles of VIP-INs in early prodromal state of AD pathology.

      Minor Points:

      - Figure 1c: Heading of VIP-Tg should have capital letters.

      Thank you for pointing that out. We have corrected the heading to "VIP-Tg" with capital letters in Figure 1c.

      - Figure 1d: The finding that no change was observed in the percentage of VIP+/CR+ is based on three animals and 3-4 slices per mouse. However, the result of VIP+CR+ in tg-mice has an outlier that might bias the results. I would suggest increasing the number of animals to confirm these results.

      Thank you for your insightful suggestion. We addressed the potential impact of the outlier in the VIP+/CR+ cell density analysis by recalculating the results after removing the outlier using the interquartile range method. This reanalysis revealed a statistically significant difference in the VIP+/CR+ cell density between non-Tg and Tg mice, which we have now detailed in the Results section. Despite this, we have chosen to retain the outlier in our final presentation to accurately represent the biological variability observed in our sample. We agree that increasing the number of animals would further validate these findings and will consider this in future studies.

      - Figure 3d: Would it be possible to identify the recorded interneurons? Is it expected that most of those are OLM cells?

      Thank you for your question. We were unable to fully recover all recorded cells using biocytin staining. However, for those cells with preserved axonal structures, we identified both OLM and bistratified cells, which are the primary targets of I-S3 cells. We have now included this information in the Results section to clarify the types of interneurons identified.

      - Figure 3: Why quantify VGat terminals instead of quantification of VIP-GFP terminals? Combined with the Calretinine labeling it would be more useful to indicate that no changes were observed at the morphological bouton level specifically in disinhibitory interneurons. Please also describe which imageJ plugin was used for the quantification.

      Thank you for your question. Our primary objective was to quantify the synaptic terminals of CR+ INs in the CA1 O/A region, which are predominantly formed by I-S3 cells. Therefore, VGaT and CR co-localization was used to guide this analysis. GFP expression in axonal boutons can sometimes be inconsistent and less reliable for precise quantification. For this analysis, we utilized the “Analyze Particles” function in ImageJ, combined with watershed segmentation, which is now specified in the Methods section.

      -  Figure 4g: How was the statistical test performed? If data was averaged across mice, please add error bars and data points in the figure.

      Thank you for your question. To compare the alternation percentage between non-Tg and Tg mice, we used Fisher’s Exact test as detailed in Supplementary Table 1. In this analysis, we considered each animal's choice individually, comparing the preference for correct versus incorrect choices between the two groups. Since Fisher’s Exact test is designed for analyzing qualitative data rather than quantitative data, averaging across mice was not applicable, and therefore, we did not include error bars or data points in the figure.

      - Figure 4h: To conclude that the increase in activity is larger in the 3xTg mice, there should be a statistical comparison for the magnitude of change between the decision and the stem zone for control and 3xTg mice. To show that there is no significant difference in this measurement in the control mice is insufficient.

      Thank you for your suggestion. We performed a statistical comparison of the magnitude of change in activity between the stem zone and the D-zone for non-Tg and 3xTg mice, as recommended. Our analysis showed no significant difference in this magnitude of change between the two genotypes. These results have now been included in the Results section. However, we would like to highlight an important finding regarding the nature of these changes. In the 3xTg mice, there was a consistent increase in the activity of O/A INs when entering the Dzone. In contrast, non-Tg mice displayed a range of responses, including both increases and decreases in activity. This indicates a higher reliability in the firing of O/A INs in the D-zone of 3xTg mice. Our recent study suggests that VIP-INs are particularly active in the D-zone (Tamboli et al., 2024). Therefore, the absence or reduced input from VIP-INs in 3xTg mice may lead to the observed higher engagement of O/A INs in this zone. We believe this observation is crucial for understanding the differential yet nuanced changes in neural dynamics in these mice.

      - In the methods, it is stated that there was a pre-selection of animals depending on learning performance. Would it be possible to also show the data from animals that did not properly learn? Alternatively, it would be useful to plot the correlation between performance in this test and the difference between activity in the stem and the decision-making zone. The reason to ask for this is that there is a trend for control animals to show reduced alternations (50 vs 80%, although not significant, it is a big difference). Considering that there is also a trend in control animals to show increased activity in the decision-making zone, it would be important to confirm that this is not only due to differences in performance. The current statistical procedure does not allow discarding this.

      In this study, we excluded from the analysis the animals that refused to explore the T-maze and spent all their time in the stem corner, or refused to explore the objects and stayed in the open field maze (OFM) corner. These exclusions applied to both non-Tg (n = 6) and Tg (n = 5) groups, indicating that low exploratory activity is not necessarily linked to AD-related mutations. During the T-maze test, we also observed several animals that made incorrect choices (4 out of 9 non-Tg and 1 out of 6 Tg mice). However, due to the low number of animals making incorrect choices, we were unable to form a separate group for analysis based on incorrect choices. These details are now provided in the Methods section.

      - Figure 4i. It is not clear when exactly cell activity was measured. If it was during the entire recording time, I think it would be interesting to see if the activity of O/A interneurons is different specifically during interaction with the object in 3xTg mice.

      Cell activity was indeed measured throughout the entire recording session and analyzed in relation to animal behavior (immobility to walking; Fig. 4d,e), and periods specifically related to interaction with objects were extracted for analysis (Figure 4i).

      - Why was the object modulation measured during a different task in which both objects were the same? The figure is misleading in that sense, as it suggests the experiment was the same as for the other panels with two different objects. It would be important to correct this if the authors want to correlate the deficits in NOR in 3xTg mice and changes in IN activity.

      The study specifically investigated object-modulated neural activity during the Sampling phase. Therefore, two identical objects were placed in the arena for animal exploration. As mentioned above, due to several animals failing to explore the OFM and objects on the second day, they were excluded from the analysis, preventing the conduct of the novel-object exploration Test Trial. Both non-Tg and Tg mice showed a lack of exploration in the OFM and Tmaze, for reasons that remain unclear. Consequently, we opted to present robust data on neural activity during the initial sampling of two identical objects. However, further investigation is needed to understand how this activity relates to deficits observed in the classical NOR test.

      - Figure. 5c-f. I would strongly suggest performing the same quantification and displaying similar figures for the fiber photometry experiments in interneurons and principal cells. It would help to interpret the data.

      We have taken the reviewer's suggestion into account and standardized the data analysis and presentation. Figures 4d, e and 5c, d now depict the walk-induced activity in INs and PCs, respectively. Figures 4h and 5f compare activity between the stem and D-zone in the T-maze. Additionally, Figures 4j and 5h illustrate the object modulation of INs and PCs, respectively.

      - Although velocity and mobility were quantified, it would be important to show also that they are not different during those times when activity was dissimilar, as in the decision zone.

      We have analyzed these data and found no significant differences between the two genotypes in terms of velocity and mobility during these periods. This analysis is now presented in Supplementary Figure 5e, f and detailed in the Results section.

      - Figure 5g-h. Similarly, I would suggest using the same metrics in order to correlate the results from interneuron and principal cell activity photometry.

      We have updated this figure to align with the presentation of interneurons (Figure 4j) and included RMS analysis to emphasize lower variance in object modulation of PCs as an indicator of increased network inhibition.

      - Was object modulation variance also different for INs depending on the mouse phenotype?

      We conducted this additional analysis but did not find any significant difference.

      - Figure S4: would it be possible to identify the postsynaptic partners?

      As mentioned above, for those cells with preserved axonal structures, we identified both OLM and bistratified cells. We have now included this information in the Results section to clarify the types of interneurons identified.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors address a fundamental unresolved question in cerebellar physiology: do synapses between granule cells (GCs) and Purkinje cells (PCs) made by the ascending part of the axon (AA) have different synaptic properties from those made by parallel fibers? This is an important question, as GCs integrate sensorimotor information from numerous brain areas with a precise and complex topography.

      Summary:

      The authors argue that CGs located close to PCs essentially contact PC dendrites via the ascending part of their axons. They demonstrate that joint high-frequency (100 Hz) stimulation of distant parallel fibers and local CGs potentiates AA-PC synapses, while parallel fiber-PC synapses are depressed. On the basis of paired-pulse ratio analysis, they concluded that evoked plasticity was postsynaptic. When individual pathways were stimulated alone, no LRP was observed. This associative plasticity appears to be sensitive to timing, as stimulation of parallel fibers first results in depression, while stimulation of the AA pathway has no effect. NMDA, mGluR1 and GABAA receptors are involved in this plasticity.

      Strengths:

      Overall, the associative modulation of synaptic transmission is convincing, and the experiments carried out support this conclusion. However, weaknesses limit the scope of the results.

      Weaknesses:

      One of the main weaknesses of this study is the suggestion that high-frequency parallel-fiber stimulation cannot induce long term potentiation unless combined with AA stimulation. Although we acknowledge that the stimulation and recording conditions were different from those of other studies, according to the literature (e.g. Bouvier et al 2016, Piochon et al 2016, Binda et al, 2016, Schonewille et al 2021 and others), high-frequency stimulation of parallel fibers leads to long-term postsynaptic potentiation under many different experimental conditions (blocked or unblocked inhibition, stimulation protocols, internal solution composition). Furthermore, in vivo experiments have confirmed that high-frequency parallel fibers are likely to induce long-term potentiation (Jorntell and Ekerot, 2002; Wang et al, 2009).

      This article provides further evidence that long-term plasticity (LTP and LTD) at this connection is a complex and subtle mechanism underpinned by many different transduction pathways. It would therefore have been interesting to test different protocols or conditions to explain the discrepancies observed in this dataset.

      Even though this is not the main result of this study, we acknowledge that the control experiments done on PF stimulation add a puzzling result to an already contradictory literature. High frequency parallel fibre stimulation (in isolation) has been shown to induce long term potentiation in vitro, but not always, and most importantly, this has been shown in vivo. This was the reason for choosing that particular stimulation protocol. Examination of in vitro studies, however, show that the results are variable and even contradictory. Most were done in the presence of GABAA receptor antagonists, including the SK channel blocker Bicuculline, whereas in the study by Binda (2016), LTP was blocked by GABAA receptor inhibition. In some studies also, LTP was under the control of NMDAR activation only, whereas in Binda (2016), it was under the control of mGluR activation. Moreover, most experiments were done in mice, whereas our study was done in rats. Our results reveal multiple mechanisms working together to produce plasticity, which are highly sensitive to in vitro conditions. We designed our experiments to be close to the physiological conditions, with inhibition preserved and a physiological chloride gradient. It is likely that experimental differences have given rise to the variability of the results and our inability to reproduce PF-LTP, but it was not the aim of this study to dissect the subtleties of the different experimental protocols and models.

      We have modified the Discussion to cover that point fully.

      Another important weakness is the lack of evidence that the AAs were stimulated. Indeed, without filling the PC with fluorescent dye or biocytin during the experiment, and without reconstructing the anatomical organization, it is difficult to assess whether the stimulating pipette is positioned in the GC cluster that is potentially in contact with the PC with the AAs. According to EM microscopy, AAs account for 3% of the total number of synapses in a PC, which could represent a significant number of synapses. Although the idea that AAs repeatedly contact the same Purkinje cell has been propagated, to the best of the review author's knowledge, no direct demonstration of this hypothesis has yet been published. In fact, what has been demonstrated (Walter et al 2009; Spaeth et al 2022) is that GCs have a higher probability of being connected to nearby PCs, but are not necessarily associated with AAs.

      We fully agree with the reviewer that we have not identified morphologically ascending axon synapses, and we stress this fact both in the first paragraph of the Results section, and again at the beginning of Discussion. Our point is mainly topographical, given the well documented geometrical organisation of the cerebellar cortex. Strictly speaking, inputs are local (including AAs) or distal (PFs). Similarly, the studies by Isope and Barbour (2002) and Walter et al. (2009), just like Sims and Hartell (2005 and 2006), have coined the term ‘ascending axon’ when drawing conclusions about locally stimulated inputs. Moreover, our results do not rely on or assume multiple contacts, stronger connections, or higher probability of connections between ascending axons and Purkinje cells. Our results only demonstrate a different plasticity outcome for the two types of inputs. Therefore, our manuscript could be rephrased with the terms ‘local’ and ‘distal’ granule cell inputs, but this would have no more implication for the results or the computation performed in Purkinje cells. However, in our experience, these terms are more confusing, and consistent with the literature, we do not wish to make this modification. However, we have modified the abstract of the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a form of synaptic plasticity at synapses from granule cells onto Purkinje cells in the mouse cerebellum, which is specific to synapses proximal to the cell body but not to distal ones. This plasticity is induced by the paired or associative stimulation of the two types of synapses because it is not observed with stimulation of one type of synapse alone. In addition, this form of plasticity is dependent on the order in which the stimuli are presented, and is dependent on NMDA receptors, metabotropic glutamate receptors and to some degree on GABAA receptors. However, under all experimental conditions described, there is a progressive weakening or run-down of synaptic strength. Therefore, plasticity is not relative to a stable baseline, but relative to a process of continuous decline that occurs whether or not there is any plasticity-inducing stimulus.

      As highlighted by the reviewer, we observed a postsynaptic rundown of the EPSC amplitude for both input pathways. Rundown could be mistaken for a depression of synaptic currents, not for a potentiation, and the progressive decrease of the EPSC amplitude during the course of an experiment leads to an underestimate of the absolute potentiation. We have taken the view to provide a strong set of control data rather than selecting experiments based on subjective criteria or applying a cosmetic compensation procedure. We have conducted control experiments with no induction (n = 17), which give a good indication of the speed and amplitude of the rundown. Comparison shows a highly significant potentiation of the ascending axon EPSC. Depression of the parallel fibre EPSC, on the other hand, was not significantly different from rundown, and we have not spoken of parallel fibre long term depression. The data show thus very clearly that ascending axon and parallel fibre synapses behave differently following the costimulation protocol.

      Strengths:

      The focus of the authors on the properties of two different synapse-types on cerebellar Purkinje cells is interesting and relevant, given previous results that ascending and parallel fiber synapses might be functionally different and undergo different forms of plasticity. In addition, the interaction between these two synapse types during plasticity is important for understanding cerebellar function. The demonstration of timing and order-dependent potentiation of only one pathway, and not another, after associative stimulation of both pathways, changes our understanding of potential plasticity mechanisms. In addition, this observation opens up many new questions on underlying intracellular mechanisms as well as on its relevance for cerebellar learning and adaptation.

      Weaknesses and suggested improvements:

      A concern with this study is that all recordings demonstrate "rundown", a progressive decrease in the amplitude of the EPSC, starting during the baseline period and continuing after the plasticity-induction stimulus. In the absence of a stable baseline, it is hard to know what changes in strength actually occur at any set of synapses. Moreover, the issues that are causing rundown are not known and may or may not be related to the cellular processes involved in synaptic plasticity. This concern applies in particular to all the experiments where there is a decrease in synaptic strength.

      We have provided an answer to that point directly below the summary paragraph. We will just add here that if the phenomenon causing rundown was involved in plasticity, it should affect plasticity of both inputs, which was not the case, clearly distinguishing the ascending axon and parallel fibre inputs.

      The authors should consider changes in the shape of the EPSC after plasticity induction, as in Fig 1 (orange trace) as this could change the interpretation.

      Figure 1 shows an average response composed of evoked excitatory and inhibitory synaptic currents. The third section of Supplementary material (supplementary figure 3) shows that this complex shape is given by an EPSC followed by a delayed disynaptic IPSC. We would like to point out that while separating EPSC from IPSC might appear difficult from average traces due to the averaged jitter in the onset of the synaptic currents, boundaries are much clearer when analysing individual traces. In the same section we discuss the results of experiments in which transient applications of SR 95531 before and after the induction protocol allowed us to measure the EPSC, while maintaining the same experimental conditions during induction. Analysis of the kinetics of the EPSCs during SR application at the beginning and end of experiments, showed that there is no change in the time to peak of both AA and PF response. The decay time of AA- and PF-EPSCs are slightly longer at the end of the experiment, even if the difference is not significant for AA inputs. This analysis has been added to the Supplementary material. Our analysis, that uses as template the EPSCs kinetics measured at the beginning and at the end of the experiments, takes directly into account these changes. The results show clearly that the presence of disynaptic inhibition doesn’t significantly affect the measure of the peak EPSC after the induction protocol nor the estimate of plasticity.

      In addition, the inconsistency with previous results is surprising and is not explained; specifically, that no PF-LTP was induced by PF-alone repeated stimulation.

      In our experimental conditions, PF-LTP was not induced when stimulating PF only, the condition that reproduces experiments in the literature. As discussed in our response to reviewer 1, a close look at the literature, however, reveals variabilities and contradictions behind seemingly similar results. They reveal intricate mechanisms working together to produce plasticity, which are sensitive to in vitro conditions. We designed our experiments to be close to physiological conditions, with inhibition preserved and a physiological chloride gradient. It is likely that experimental differences have given rise to the variability of the results and our inability to observe PF-LTP. We have modified the Discussion section to cover that point thoroughly in the context of past results. 

      The authors test the role of NMDARs, GABAARs and mGluRs in the phenotype they describe. The data suggest that the form of plasticity described here is dependent on any one of the three receptors. However, the location of these receptors varies between the Purkinje cells, granule cells and interneurons. The authors do not describe a convincing hypothetical model in which this dependence can be explained. They suggest that there is crosstalk between AA and PF synapses via endocannabinoids downstream of mGluR or NO downstream of NMDARs. However, it is not clear how this could lead to the long-term potentiation that they describe. Also, there is no long-lasting change in paired-pulse ratio, suggesting an absence of changes in presynaptic release.

      We suggest in the result section that the transient change in paired pulse ratio (PPR) is linked to a transient presynaptic effect, but there was no significant long term change of the PPR, suggesting that the long term effects observed are linked to postsynaptic changes. We now stress this point in the Results and Discussion sections.

      Concerning the involvement of multiple molecular pathways, investigators often tested for the involvement of NMDAR or mGluRs in cerebellar plasticity, rarely both. Here we showed that both pathways are involved. The conjunctive requirement for NMDAR and mGluR activation could easily be explained based on the dependence of cerebellar LTP and LTD on the concentrations of both NO and postsynaptic calcium (Coesman et al., 2004; Safo and Regehr, 2005; Bouvier et al., 2016; Piochon et al., 2016).

      We also observed an effect of GABAergic inhibition. GABAergic inhibition was elegantly shown by Binda (2016) to regulate calcium entry together with mGluRs, and control plasticity induction. A similar mechanism could contribute to our results, although inhibition might have additional effects. We have modified the Discussion of the manuscript to clarify the pathways involved in plasticity and added a diagram to highlight the links between the different molecular pathways, potential cross talk mechanisms, and the location of receptors.

      Is the synapse that undergoes plasticity correctly identified? In this study, since GABAergic inhibition is not blocked for most experiments, PF stimulation can result in both a direct EPSC onto the Purkinje cell and a disynaptic feedforward IPSC. The authors do address this issue with Supplementary Fig 3, where the impact of the IPSC on the EPSC within the EPSC/IPSC sequence is calculated. However, a change in waveform would complicate this analysis. An experiment with pharmacological blockade will make the interpretation more robust. The observed dependence of the plasticity on GABAA receptors is an added point in favor of the suggested additional experiments.

      We did consider that due to long recording times there might be kinetic changes, and that’s the reason why the experiments of Supplementary figure 3 were done with pharmacological blockade of GABAAR with SR, both before and again after LTP induction. The estimate of the amplitude of the EPSC is based on the actual kinetics of the response at both times.

      A primary hypothesis of this study is that proximal, or AA, and distal, or PF, synapses are different and that their association is specifically what drives plasticity. The alternative hypothesis is that the two synapse-types are the same. Therefore, a good control for pairing AA with PF would be to pair AA with AA and PF with PF, thereby demonstrating that pairing with each other is different from pairing with self.

      Pairing AA with AA would be difficult because stimulation of AA can only be made from a narrow band below the PC and we would likely end up stimulating overlapping sets of synapses. However, Figure 5 shows the effect of stimulating PF and PF, while also mimicking the sparse and dense configuration of the control experiment. It shows that sparse PF do not behave like AA. Sims and Hartell (2006) also made an experiment with sparse PF inputs and observed clear differences between sparse local (AA) and sparse distal (PF) synapses.

      It is hypothesized that the association of a PF input with an AA input is similar to the association of a PF input with a CF input. However, the two are very different in terms of cellular location, with the CF input being in a position to directly interact with PF-driven inputs. Therefore, there are two major issues with this hypothesis: 1) how can subthreshold activity at one set of synapses affect another located hundreds of micrometers away on the same dendritic tree? 2) There is evidence that the CF encodes teaching/error or reward information, which is functionally meaningful as a driver of plasticity at PF synapses. The AA synapse on one set of Purkinje cells is carrying exactly the same information as the PF synapses on another set of Purkinje cells further up and down the parallel fiber beam. It is suggested that the two inputs carry sensory vs. motor information, which is why this form of plasticity was tested. However, the granule cells that lead to both the AA and PF synapses are receiving the same modalities of mossy fiber information. Therefore, one needs to presuppose different populations of granule cells for sensory and motor inputs or receptive field and contextual information. As a consequence, which granule cells lead to AA synapses and which to PF synapses will change depending on which Purkinje cell you're recording from. And that's inconsistent with there being a timing dependence of AA-PF pairing in only one direction. Overall, it would be helpful to discuss the functional implications of this form of plasticity.

      We do not hypothesise that association of the AA and PF inputs is similar to the association of PF and climbing fibre inputs. We compare them because it is the other known configuration triggering associative plasticity in Purkinje cells. It is indeed interesting to observe that even if the inputs are very small compared to the powerful climbing fibre input, they can be effective at inducing plasticity. Physiologically, the climbing fibre signal has been linked to error and reward signals, but reward signals are also encoded by granule cell inputs (Wagner et al., 2017). We have modified the discussion to make sure that we do not suggest equivalence with CF induced LTD.

      Moreover, we fully agree that AA and PF synapses made up by a given granule cell carry the same information, and cannot encode sensory and motor information at the same time. AA synapses from a local granule cell deliver information about the local receptive field, but PF synapses from the same granule cell will deliver contextual information about that receptive field to distant Purkinje cells. In the context of sensorimotor learning, movement is learnt with respect to a global context, not in isolation, therefore learning a particular association must be relevant. The associative plasticity we describe here could help explain this functional association. We have clarified the discussion.

      Reviewer #3 (Public Review):

      Granule cells' axons bifurcate to form parallel fibers (PFs) and ascending axons (AAs). While the significance of PFs on cerebellar plasticity is widely acknowledged, the importance of AAs remains unclear. In the current paper, Conti and Auger conducted electrophysiological experiments in rat cerebellar slices and identified a new form of synaptic plasticity in the AA-Purkinje cell (PC) synapses. Upon simultaneous stimulation of AAs and PFs, AA-PC EPSCs increased, while PFs-EPSCs decreased. This suggests that synaptic responses to AAs and PFs in PCs are jointly regulated, working as an additional mechanism to integrate motor/sensory input. This finding may offer new perspectives in studying and modeling cerebellum-dependent behavior. Overall, the experiments are performed well. However, there are two weaknesses. First, the baseline of electrophysiological recordings is influenced significantly by run-down, making it difficult to interpret the data quantitatively. The amplitude of AA-EPSCs is relatively small and the run-down may mask the change. The authors should carefully reexamine the data with appropriate controls and statistics. Second, while the authors show AA-LTP depends on mGluR, NMDA receptors, and GABA-A receptors, which cell types express these receptors and how they contribute to plasticity is not clarified. The recommended experiments may help to improve the quality of the manuscript.

      As highlighted by the reviewer and developed above in response to reviewer 2, we observed a postsynaptic rundown of the EPSC amplitude. Rundown could be mistaken for a depression of synaptic currents, not for a potentiation. Moreover, we have conducted control experiments with no induction (n = 17), which give a good indication of the speed and amplitude of the rundown, and provide a baseline. Comparison shows a highly significant potentiation of the ascending axon EPSC, relative to baseline and relative to these control experiments. Depression of the parallel fibre EPSC on the other hand was not significantly different from rundown. For that reason we have not spoken of parallel fibre long term depression. The data, however, show that ascending axon and parallel fibre synapses behave very differently following the costimulation protocol.

      We have discussed above in our response to reviewer 2 the potential involvement of mGluRs, NMDARs and GABAARs. We have clarified the discussion of the pathways involved in plasticity and added a diagram to highlight the links between the different molecular pathways, potential cross talk mechanisms, and the location of receptors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - If Chloride concentration cannot be modified, recordings should be performed at the Chloride reversal potential to avoid strong bias in amplitude measurements (e.g. in Figures 3 and 5 outward current was observed while not visible in Figures 1 and 4.

      The balance between excitation and inhibition dictates whether there is a visible outward component, and this varies with the connections tested. Careful control experiments with SR application presented in supplementary figure 3 show that the delay of the IPSC does not significantly affect measurement of the peak amplitude of the EPSC. The reversal potential for Clin our study (-85 mV), chosen to reproduce the physiological gradient in Purkinje cells, is too low to record from Purkinje cells at this potential in good conditions as it activates the hyperpolarisation activated cation current Ih, generating huge inward currents.

      - It is not clear whether, during the current clamp, the potential was maintained at -65 mV throughout the induction protocol.

      The potential was set and maintained around -65mV during the induction protocol. The method section has been amended to specify that point.

      - Experiments using GABAB or endocannabinoid antagonists would have been interesting to assess the role of presynaptic plasticity occluding postsynaptic plasticity.

      We are not sure why the reviewer suggested these particular experiments to test for the role of presynaptic plasticity. GABAB and endocannabinoid receptor activation both have presynaptic effects at granule cell to Purkinje cell synapses. They decrease release probability, and as a result increase the paired pulse ratio (Dittman and Regehr, 1997; Safo and Regehr, 2005). Here we only observed a transient decrease of the paired pulse ratio. Additionally, presynaptic endocannabinoid receptor activation, linked to postsynaptic mGluR1 activation and release of endocannabinoids, was shown to be required for induction of postsynaptic PF-LTD (Safo and Regehr, 2005). This effect required climbing fibre stimulation and mGluR activation. Here we show that mGluR1 inhibition did not inhibit the PF depression nor affect the transient change in PPR. Therefore there is no indication that activation of these receptors could induce a pre-synaptic depression occluding postsynaptic plasticity.

      - To give credit to this new plasticity in contradiction with many previous studies, induction pathways should be addressed more deeply.

      As developed earlier in response to the public review, this study does not contradict previous studies, expect maybe that by Binda et al., (2016), conducted on mice. From our point of view, our study in fact reconciles past results which have alternatively involved the mGluR or NMDAR pathways, whereas the molecular downstream pathways they recruit can easily cooperate. We aim to describe a new phenomenon and we cannot cover the mechanistic dissection which has been performed to date on plasticity in the cerebellar cortex.

      - The quality of the figures could be enhanced by modifying the dashed line.

      We have made the dashed line more discrete.

      Reviewer #2 (Recommendations For The Authors):

      - Is there cross-talk between the two synaptic pathways?

      In order to explain the associative nature of AA-LTP we suggest that a signal is generated at the AA input during the induction protocol only when the PF input is also stimulated, i.e. a form of cross-talk takes place between the two synaptic territories. We have not tested for cross-talk during control conditions but we discuss the fact that given the size of the Purkinje cell dendritic tree, the size of the inputs and their geometrical configuration, it is highly unlikely. We discuss possible cross-talk mechanisms.

      - Clarification question: "While the peak amplitude of the first response in the pair of stimulations showed a progressive decline, the peak amplitude of the second response of both AA and PF underwent either LTP or LTD respectively..." Does this mean that all LTP/LTD figures show the amplitude of the second EPSC in the paired pulse stimulation, and that the first EPSC has a different response? If so, this should be mentioned in the Methods section and implications discussed.

      All figures show both the amplitude of the first and second EPSCs in the pair of stimulations. In Figure 1A, 3A, 4A and 5B the paired stimulation protocol is depicted with colours and symbols used in the associated graphs, with closed symbols for the first and open symbols for the second EPSC. Figure legends have been amended to clarify this point. The average values given in the Results section and figure legends relate to the first EPSC only for clarity. As can be seen from the figures, long term plasticity affected the first and second EPSC in a very similar manner. However, individual symbols show that during a transient period, the first and second EPSCs are differentially affected by the induction protocol, resulting in a transient change of the PPR.

      Minor suggestions:

      - It would be helpful to have a reference for the statement that 1-2% of stimulated fibers come from nearby GCs when stimulation is distal.

      We have modified the text to explain our calculation based on the data of Pichitpornchai et al., 1994. P4 result section.

      - Does the shading over the plasticity time course traces come from the standard error of the mean?

      Shading over the plasticity time course plots shows the standard error of the mean. This is now clearly stated in figure legends.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Whether the plasticity between AAs and PCs is regulated by the post-synaptic or pre-synaptic mechanisms should be addressed or discussed. Based on the results of PPR (mostly unchanged after induction), the post-synaptic mechanism may be more significant. Supplemental Figure 2C shows a trend toward a positive correlation between AALTP and the number of spikes, suggesting intracellular calcium levels in the post-synaptic Purkinje cells may be important. Whether this is true or not can be directly tested by the addition of BAPTA in the recording pipettes.

      The absence of a long lasting effect on the paired pulse ratio (PPR) indicates that postsynaptic mechanisms are involved in long term changes. This is in line with the dependence of plasticity induced with similar protocols on the concentrations of NO and postsynaptic calcium, both affecting postsynaptic targets, as developed in our response to reviewer 2. BAPTA interferes with calcium and mGluR signalling, and could be used to further confirm the involvement of a postsynaptic mechanism, however, we did not wish to pursue further the dissection of the signalling cascade. We have modified the Results and Discussion sections to include a discussion of pre and postsynaptic mechanisms.

      (2) Most results from the plasticity experiments are shown as average/sem and do not include individual data, making ithard to appreciate the magnitude of the changes. The authors could show the individual data at some time points (e.g. 5 min before and 30 min after induction), plot bar-graphs (Figure 2C with individual data), or boxplots to compare different conditions and perform statistics.

      Individual data points are now visible for plasticity induction in Figure 2C and Supplementary Figure 2 for a number of conditions. Statistics have been performed as detailed in the text and legend of Fig 2.

      (3) In addressing point #2, it is strongly recommended that the authors include the values for controls without inductionbecause AA/PF-EPSCs undergo significant run-down. In most experiments, the authors compare the magnitude of plasticity with baseline changes in Supplemental Figure 1. This should not be appropriate for some experiments, such as Figures 3 & 4, where pharmacological treatments are performed. The authors should carefully consider including the appropriate controls from baseline recording to rule out significant confound by the run-down.

      We agree that control experiments without stimulation (no Stim) are only appropriate controls for the initial synchronous stimulation and AA and PF only experiments (Fig 1). All the other experiments were compared to the synchronous stimulation experiments, not to control No Stim. The synchronous stimulation protocol is strictly the same as that applied in experiments with pharmacological treatments and the appropriate control to test whether treatments affected plasticity. This is now systematically specified in the Results section.

      (4) The authors recorded mixed EPSC/IPSCs and used a fitting approach to extract EPSCs. Applying AMPA-receptor blockers to check that extracted IPSCs are correctly predicted may solidify the reliability of the approach. An additional concern is that this approach can only be used if the waveform of EPSC/IPSC does not change with plasticity. The authors should compare the waveforms between conditions to address this point.

      Fits were not used to extract EPSCs. EPSCs were isolated by blocking IPSCs with SR95531, and the IPSCs were then extracted by subtraction from the mixed EPSC/IPSC. Fits were then done of the isolated EPSC and the extracted IPSC. This procedure was applied both at the start of the experiment and at the end to avoid changes in kinetics that would influence measurements. A section of supplementary material is devoted to this analysis. Isolating IPSCs using AMPAR blockers is not possible as IPSCs are disynaptic. AMPAR blockers would fully suppress inhibition.

      (5) While the AA-LTP depends on NMDA-Rs, which cell type is responsible is not clear. Recording NMDA components in AA/PF-EPSCs should be informative in addressing this point. Cesana et al suggested that AA induces significant activation of NMDA-Rs in Golgi cells (PMID: 23884948). Whether AA stimuli could significantly evoke NMDA current in the experimental condition used in this paper could provide essential information.

      The granule cell to Purkinje cell EPSCs are devoid of an NMDAR component (Llano et al., 1991), and there is no postsynaptic NMDARs at granule cell to PC synapses, but a proportion of presynaptic boutons show the presence of NMDARs (Bidoret et al, 2009). This is now stated clearly on p8.  Presynaptic NMDAR have been involved in LTP and LTD of parallel fibre synapses (Casado et al., 2002; Bouvier et al., 2016; Schonewille et al., 2021), and linked to the activation of NOS in granule cell axons. However, we do not know whether presynaptic NMDARs are also present at AA synapses. NMDAR and NOS are also expressed by molecular layer interneurons, and have sometimes been involved in LTD induction (Kono et al., 2019), although this is disputed. In the paper by Cesana (2013), white matter stimulation activated mossy fibre inputs to granule cells, and as a consequence, granule cell to Golgi cell disynaptic EPSCs. The authors identified AA synapses on the basolateral dendrites of Golgi cells, and showed NMDAR activation associated with the mossy fibre to granule cell EPSC. Granule cell to Golgi cell synapses were shown to activate both postsynaptic AMPA and NMDA receptors (Dieudonné, 1999). But to our knowledge, Golgi cells do not express NOS. Therefore it is unlikely that activation of NMDARs in Golgi cells is linked to synaptic plasticity in Purkinje cells.

      (6) Pharmacological experiments in Figure 3 show that AA-LTP is dependent on mGluR. The authors mentioned that it could be explained by the presence and absence of mGluRs in PFs and AAs, respectively. This is an important and reasonable possibility and should be tested. The authors could simply check whether slow EPSCs can be recorded by the AA activation.

      Activation of the mGluR slow EPSC by AA stimulation would reveal the presence of mGluRs at AA inputs. We know, however, that sparse PF stimulation does not activate the mGluR slow EPSC nor endocannabinoid release unless glutamate transporters are blocked (Marcaggi and Attwell., 2005). This is thought to reflect insufficient glutamate buildup in the sparse configuration to activate mGluR1s. AA inputs are sparsely distributed and are not expected to activate the slow EPSC either, and this is confirmed by our own experiments (CA personal communication). However, mGluR1 mediated Ca2+ release from stores shows a higher sensitivity to glutamate than the slow EPSC (Canepari and Ogden, 2006) and might take place with sparse inputs, but Ca2+ signals have not been investigated in this configuration. Therefore the absence of the slow EPSC is not sufficient proof that mGluR1s are not activated and not present at AA synapses. This is now further discussed p12.

      Minor points:

      (1) The authors should describe how they adjusted the stimulation strength for both AAs and PFs.

      Adjustment of the stimulation intensity is now described in the Methods section.

      (2) A rationale explaining why the authors chose the current induction protocol (synchronous stimulation of both inputs) should be included. This will help the readers to understand the background of the study.

      Papers by Sims and Hartell (2005, 2006) and experimental evidence indicated that AA and PF inputs may have different properties, and as a result may play different roles. Moreover, based on the morphology of the cerebellar granule cell and Purkinje cell, AA and PF inputs can carry different information to a given Purkinje cell. We reasoned that co-presentation of the inputs might represent an important piece of information for the circuit, signalling functional association, and lead to plasticity, as seen for motor command and sensory feedback in cerebellar-like structures, or for PF and climbing fibre. We have tried to convey that rational in the abstract and introduction.

      (3) Supplemental Figure 2B: the x-axis may be labeled incorrectly, Is the x-axis of the top graph for PF PF-EPSC? Thex-axis for the bottom graphs should be the summation of AA- and PF-EPSCs.

      This has been corrected.

      (4) "mglur1" on page 10 should be mGluR1.

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please reorder the supplementary figures in the order they are referred to in the Results section for ease of reading. Supp Fig 5 b - should read 'Mean normalized fluorescence of LC ROIs (n = 87) during immobile periods aligned to the switch from familiar to novel environment.’

      We thank the reviewer for highlighting these issues and have reordered the supplementary figures and edited the figure legends appropriately.

      Reviewer #2 (Recommendations For The Authors):

      The authors should include sample size justifications (e.g. based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors).

      In response to this concern, we have added a statement to the “Imaging Sessions” section of the methods. Here we highlight sample sizes were largely based on previous studies and/or limited by the difficulty of recordings and the limited number of visible axons per imaging session.

      Reviewer #3 (Recommendations For The Authors):

      The addition of Supp. Fig 5 partially addresses my previous point 3. However, the claim of dissociation between VTA-CA1 and LC-CA1 would be strengthened by showing that VTA-CA1 axons do not respond to the darkness -> familiar environment in Supp Fig 5. This is particularly important given that (1) the additional 2 VTA-CA1 axons in the revision were not recorded during transitions to novel environments and (2) the overall concern of the reviewers that the low n and heterogeneity of the VTA-CA1 dataset may lead to a false negative. Providing VTA-CA1 data for the darkness -> familiar environment would provide a within-manuscript replication that these axons are not responding to environment changes; a major claim of this manuscript.

      While we agree that data of VTA-CA1 axons during the switch from darkness to the familiar environment would provide additional evidence that these axons are not responding to environment changes, unfortunately, VTA axons were not recorded during the switch from familiar to novel.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The authors present 16 new well-preserved specimens from the early Cambrian Chengjiang biota. These specimens potentially represent a new taxon which could be useful in sorting out the problematic topology of artiopodan arthropods - a topic of interest to specialists in Cambrian arthropods. Because the anatomic features in the new specimens were neither properly revealed nor correctly interpreted, the evidence for several conclusions is inadequate. 

      We thank the Senior Editor, Reviewing Editor and three reviewers for their work, and for their comments aimed at improving this project and manuscript. We have engaged with all the comments in detail, in order to strengthen our work. This includes adding additional data to support that all Acanthomeridion specimens belong to a single species, running further phylogenetic analyses including more trilobite terminals to test the specific hypothesis and interpretation raised by Reviewer 2, and visualising our results in treespace in order to determine support for the different interpretations of the ventral structures and their implications for the evolution of Artiopoda. We have also greatly expanded the introduction, which we feel adds clarity to areas misunderstood by some reviewers in the previous version of the manuscript.

      Our point-by-point response to the public reviews of the reviewers are outlined below. We have also made changes resulting from the additional suggestions which are not public, which we have not reproduced below. We submit a new version of the main text, and can provide a tracked changes version if required. The new main text includes 9 figures and is 8624 words including captions and reference list.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dorsal and ventral anatomies of a potential new taxon of artipodeans that are closely related to trilobites. Authors assigned their specimens to Acanthomeridion serratum and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critically, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda. 

      We thank Reviewer 1 for their comments on the strengths and weaknesses of the previous version of the manuscript. We hope that the revised version strengthens our conclusions that Acanthomeridion anacanthus is a junior synonym of A. serratum.

      Strengths: 

      New specimens are highly qualified and informative. The morphology of the dorsal exoskeleton, except for the supposed free cheek, was well illustrated and described in detail, which provides a wealth of information for taxonomic and phylogenic analyses. 

      Weaknesses: 

      The weaknesses of this work are obvious in a number of aspects. Technically, ventral morphology is less well revealed and is poorly illustrated. Additional diagrams are necessary to show the trunk appendages and suture lines. Taxonomically, I am not convinced by the authors' placement. The specimens are markedly different from either Acanthomeridion serratum Hou et al. 1989 or A. anacanthus Hou et al. 2017. The ontogenetic description is extremely weak and the morpholical continuity is not established. Geometric and morphometric analyses might be helpful to resolve the taxonomic and ontogenic uncertainties. 

      We appreciate that the reviewer was not convinced by our synonimisation in the first version of the manuscript. The recommendation of the reviewer to provide linear morphometric support for our synonymisation was much appreciated. We have provided measurements of the length and width of the thorax (Figure 6 in the new version), visualising the position of specimens previously assigned to A. anacanthus, to show this morphological continuity. These act as a complement to Figure 5, which shows the fossils in an ontogenetic trend.

      I am confused by the author's description of the free cheek (libragena) and ventral plate. Are they the same object? How do they connect with other parts of the cephalic shield, e.g. hypostome, and fixgena? Critically, the homology of cephalic slits (eye slits, eye notch, dorsal suture, facial suture) is not extensively discussed either morphologically or functionally.

      We appreciate that the brevity of the introduction in the previous version led to some misunderstandings and some confusion. We have provided a greatly expanded introduction, including a new Figure 1, which outlines the possible homologies of the ventral plates and the three hypotheses considered in this study. The function of the cephalic and dorsal suture are now discussed in more detail both in introduction and discussion.

      Finally, the authors claimed that phylogenic results support two separate origins rather than a deep origin. However, the results in Figure 4 can explain a deep homology of the cephalic suture at molecular level and multiple co-options within the Atiopoda. 

      A deep molecular origin is difficult to demonstrate using solely fossil material from an extinct group such as Artiopoda. Thus our study focuses on morphological origins. The number of losses required for a deep morphological origin means that we favour multiple independent morphological origins.

      Reviewer #2 (Public Review): 

      Overall: This paper describes new material of Acanthomeridion serratum that the authors claim supports its synonymy with Acanthomeridion anacanthus. The material is important and the description is acceptable after some modification. In addition, the paper offers thoughts and some exploration of the possibility of multiple origins of the dorsal facial suture among artiopods, at least once within Trilobita and also among other non-trilobite artiopods. Although this possibility is real and apparently correct, the suggestions presented in this paper are both surprising and, in my opinion, unlikely to be true because the potential homologies proposed with regard to Acanthomeridion and trilobite-free cheeks are unconventional and poorly supported. 

      What to do? I can see two possibilities. One, which I recommend, is to concentrate on improving the descriptive part of the paper and omit discussion and phylogenetic analysis of dorsal facial suture distribution, leaving that for more comprehensive consideration elsewhere. The other is to seek to improve both simultaneously. That may be possible but will require extensive effort. 

      We thank the reviewer for their detailed comments and suggestions for multiple ways in which we might revise the manuscript. We have taken the option that is more effort, but we hope more reward, in interrogating the larger question alongside improving the descriptive part of the paper. This has taken a long time and incorporation of new techniques, but has in our opinion greatly strengthened the work.

      Major concerns 

      Concern 1 - Ventral sclerites as free cheek homolog, marginal sutures, and the trilobite doublure 

      Firstly, a couple of observations that bear on the arguments presented - the eyes of A. serratum are almost marginal and it is not clear whether a) there is a circumocular suture in this animal and b) if there was, whether it merged with the marginal suture. These observations are important because this animal is not one in which an impressive dorsal facial suture has been demonstrated - with eyes that near marginal it simply cannot do so. Accordingly, the key argument of this paper is not quite what one would expect. That expectation would be that a non-trilobite artiopod, such as A. serratum, shows a clear dorsal facial suture. But that is not the case, at least with A. serratum, because of its marginal eyes. Rather, the argument made is that the ventral doublure of A. serratum is the homolog of the dorsal free cheeks of trilobites. This opens up a series of issues. 

      We appreciate that the reviewer disagrees with both interpretations we offered for the ventral plates, and has offered a third interpretation for the homology of this feature with the doublure of trilobites. Support for our original interpretation comes from the position of the eye stalks in Acanthomeridion, which fall very close to the suture between ventral plate rest of the cephalon. However, we appreciate that the reviewer has a valid interpretation, that the ventral plates might be homologues of the doublure alone.

      To clarify the (two, now three) hypotheses of homology for the ventral plates considered in this study, we provide a new summary figure (Figure 1). In addition, the introduction has been greatly lengthened with further discussion of the different suture types in trilobites, their importance for trilobite classification schemes, and extensive references to older literature are now included. Further, we add background to the hypotheses around the origins of dorsal ecdysial sutures. 

      We add that the interpretation of A. serratum as having features homologous to the dorsal sutures of trilobites is already present in the literature, and so while the reviewer may disagree with it, it is certainly a hypothesis that requires testing.

      The paper's chief claim in this regard is that the "teardrop" shaped ventral, lateral cephalic plates in Acanthomeridion serratum are potential homologs of the "free cheeks" of those trilobites with a dorsal facial suture. There is no mention of the possibility that these ventral plates in A. serratum could be homologs of the lateral cephalic doublure of olenelloid trilobites, which is bound by an operative marginal suture or, in those trilobites with a dorsal facial suture, that it is a homolog of only the doublure portions of the free cheeks and not with their dorsal components. 

      We include this third possibility in our revised analyses and manuscript. To test this properly required adding in an olenelloid trilobite to our matrix, as we needed a terminal that had both a marginal and circumoral suture, but not fused. We chose Olenellus getzi for this purpose, as it is the only Olenellus with some appendages known (the antennae). We also added further characters to the morphological matrix, and additional trilobites from which soft tissues are known, in order to better resolve this part of the tree. Trilobites in the final analyses were: Anacheirurus adserai, Cryptolithus tesselatus, Eoredlichia intermedia, Olenoides serratus, Olenellus getzi, Triarthrus eatoni.

      However, addition of these trilobites added a further complication. Under unconstrained analysis, Olenellus getzi was resolved with Eoredlichia intermediata as a clade sister to all other trilobites.

      Thus the topology of Paterson et al. 2019 (PNAS) was not recovered, and so the hypothesis of Reviewer 2 could not be robustly tested. In order to achieve a topology comparable to Paterson et al., we ran a further three analyses, where we constrained a clade of all trilobites except for O. getzi. This recovered a topology where the earliest diverging trilobites had unfused sutures, and thus one suitable for considering the role of Acanthomeridion serratum ventral plates as homologues of the doublure of trilobites.

      Unfortunately, for these analyses (both constrained and unconstrained), Acanthomeridion was not resolved as sister to trilobites, but instead elsewhere in the tree (see Table 1 in main text, Fig. 9, and  SFig 9). Thus our analyses do not find support for the reviewer’s hypothesis as multiple origins of this feature are still required.

      It was still an excellent point that we should consider this hypothesis, and we have retained it, and discussion surrounding it, in our manuscript.

      The introduction to the paper does not inform the reader that all olenelloids had a marginal suture - a circumcephalic suture that was operative in their molting and that this is quite different from the situation in, say, "Cedaria" woosteri in which the only operative cephalic exoskeletal suture was circumocular. The conservative position would be that the olenelloid marginal suture is the homolog of the marginal suture in A. serratum: the ventral plates thus being homolog of the trilobite cephalic doublure, not only potential homolog to the entire or dorsal only part of the free cheeks of trilobites with a dorsal facial suture. As the authors of this paper decline to discuss the doublure of trilobites (there is a sole mention of the word in the MS, in a figure caption) and do not mention the olenelloid marginal suture, they give the reader no opportunity to assess support for this alternative. 

      At times the paper reads as if the authors are suggesting that olenelloids, which had a marginal cephalic suture broadly akin to that in Limulus, actually lacked a suture that permitted anterior egression during molting. The authors are right to stress the origin of the dorsal cephalic suture in more derived trilobites as a character seemingly of taxonomic significance but lines such as 56 and 67 may be taken by the non-specialist to imply that olenelloids lacked a forward egressionpermiting suture. There is a notable difference between not knowing whether sutures existed (a condition apparently quite common among soft-bodied artiopods) and the well-known marginal suture of olenelloids, but as the MS currently reads most readers will not understand this because it remains unexplained in the MS. 

      As noted in response to a previous point (above) we now have a greatly expanded introduction which should give the reader an opportunity to assess support for this alternative hypothesis. We now include Olenellus getzi in our analyses, and have added characters to the morphological matrix to make this clear.

      A reference to the case of ‘Cedaria’ woosteri is made in the introduction to highlight further the variability of trilobites, as is a reference to Foote’s analysis of cranidial shapes and support this provides for a  single origin of the dorsal suture.

      With that in mind, it is also worth further stressing that the primary function of the dorsal sutures in those which have them is essentially similar to the olenelloid/limulid marginal suture mentioned above. It is notable that the course of this suture migrated dorsally up from the margin onto the dorsal shield and merged with the circumocular suture, but this innovation does not seem to have had an impact on its primary function - to permit molting by forward egression. Other trilobites completely surrendered the ability to molt by forward egression, and there are even examples of this occurring ontogenetically within species, suggesting a significant intraspecific shift in suture functionality and molting pattern. The authors mention some of this when questioning the unique origin of the dorsal facial suture of trilobites, although I don't understand their argument: why should the history of subsequent evolutionary modification of a character bear on whether its origin was unique in the group? 

      We include reference to evolutionary modification and loss of this character as it is important to stress that if a character is known to have been lost multiple times it is possible that it had a deeper root (in an earlier diverging member of Artiopoda than Trilobita) and was lost in olenelloids. This is the question that we seek to address in our manuscript.

      The bottom line here is that for the ventral plates of A. serratum to be strict homologs of only the dorsal portion of the dorsal free cheeks, there would be no homolog of the trilobite doublure in A. serratum. The conventional view, in contrast, would be that the ventral plates are a homolog of the ventral doublure in all trilobites and ventral plates in artiopods. I do not think that this paper provides a convincing basis for preferring their interpretation, nor do I feel that it does an adequate job of explaining issues that are central to the subject. 

      We stress that our interpretations – that the ventral plates are not homologous to any artiopodan feature or that they are homologous to the free cheeks of trilobites – have both been raised in the literature before. Whereas we could not find mention of the reviewer’s ‘conventional view’ relating to Acanthomeridion. We appreciate that this view is still valid and worth investigating, which we have done in the further analyses conducted. However, we did not find support for it. Instead we find some support for both ventral plates as homologues of free cheeks, and as unique structures within Artiopoda.

      Concern 2. Varieties of dorsal sutures and the coexistence of dorsal and marginal sutures 

      The authors do not clarify or discuss connections between the circumocular sutures (a form of dorsal suture that separates the visual surface from the rest of the dorsal shield) and the marginal suture that facilitates forward egression upon molting. Both structures can exist independently in the same animal - in olenelloids for example. Olenelloids had both a suture that facilitated forward egression in molting (their marginal suture) and a dorsal suture (their circumocular suture). The condition in trilobites with a dorsal facial suture is that these two independent sutures merged - the formerly marginal suture migrating up the dorsal pleural surface to become confluent with the circumocular suture. (There are also interesting examples of the expansion of the circumocular suture across the pleural fixigena.) The form of the dorsal facial suture has long figured in attempts at higher-level trilobite taxonomy, with a number of character states that commonly relate to the proximity of the eye to the margin of the cephalic shield. The form of the dorsal facial suture that they illustrate in Xanderella, which is barely a strip crossing the dorsal pleural surface linking marginal and circumocular suture, is comparable to that in the trilobites Loganopeltoides and Entomapsis but that is a rare condition in that clade as a whole. The paper would benefit from a clear discussion of these issues at the beginning - the dorsal facial suture that they are referring to is a merged circumcephalic suture and circumocular suture - it is not simply the presence of a molt-related suture on the dorsal side of the cephalon. 

      We have added in an expanded introduction where these points are covered in detail. We appreciate that this was not clear in the earlier version, and this suggestion has greatly improved our work.

      Concern 3. Phylogenetics 

      While I appreciate that the phylogenetic database is a little modified from those of other recent authors, still I was surprised not to find a character matrix in the supplementary information (unless it was included in some way I overlooked), which I would consider a basic requirement of any paper presenting phylogenetic trees - after all, there's no a space limit. It is not possible for a reviewer to understand the details of their arguments without seeing the character states and the matrix of state assignments. 

      A link to a morphobank project was included in the first submission. This project has been updated for the current submission, including an additional matrix to treat the reviewer’s hypothesis for the ventral plates. Morphobank Project #P4290. Email address: P4290, reviewer password:

      Acanthomeridion2023, accessible at morphobank.org. We have added in additional details for the reviewer and others to help them access the project:

      The project can be accessed at morphobank.org, using the below credentials to log in:  Email address: P4290, Password: Acanthomeridion 2023.

      The section "phylogenetic analyses" provides a description of how tree topology changes depending on whether sutures are considered homologous or not using the now standard application of both parsimony and maximum likelihood approaches but, considering that the broader implications of this paper rest of the phylogenetic interpretation, I also found the absence of detailed discussion of the meaning and implications of these trees to be surprising, because I anticipated that this was the main reason for conducting these analysis. The trees are presented and briefly described but not considered in detail. I am troubled by "Circles indicate presence of cephalic ecdysial sutures" because it seems that in "independent origin of sutures" trilobites are considered to have two origins (brown color dot) of cephalic ecdysial sutures - this may be further evidence that the team does not appreciate that olenelloids have cephalic ecdysial sutures, as the basal condition in all trilobites. Perhaps I'm misunderstanding their views, but from what's presented it's not possible to know that. Similarly, in the "sutures homologous" analyses why would there be two independent green dots for both Acanthomeridion and Trilobita, rather than at the base of the clade containing them both, as cephalic ecdysial sutures are basal to both of them? Here again, we appear to see evidence that the team considers dorsal facial sutures and cephalic ecdysial sutures to be synonymous - which is incorrect.  

      We appreciate that the reviewer misunderstood the meaning of the dots, leading to confusion. The dots indicated how features were coded in the phylogenetic analysis. In our revised version of this figure (Figure 8 in the new version), these dots are now clearly labelled as indicating ‘coding in phylogenetic matrix’. Further, with the revised character list, we now can provide additional detail for the types of sutures (relevant as we now include more trilobite terminals).

      This point aside, and at a minimum, that team needs to do a more thorough job of characterizing and considering the variety of conditions of dorsal sutures among artiopods, their relationships to the marginal suture and to the circumocular suture, the number, and form of their branches, etc. 

      We thank the reviewer for this summary, and appreciate their concerns and thorough review. Our revised version takes into account all these points raised, and they have greatly improved the clarity, scope and thoroughness of the work.

      Reviewer #3 (Public Review): 

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are shown to be associated with ventral plates that the authors very reasonably homologise with the free cheeks of trilobites. A slight update of a phylogenetic dataset developed by Du et al, then refined slightly by Chen et al, then by Schmidt et al, and again here, permits another attempt to optimise the number of origins of dorsal ecdysial sutures in trilobites and their relatives. 

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variations within a single species. New microtomographic data shed some light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites. 

      We thank the Reviewer 3 for their positive comments about the manuscript. We appreciate the constructive comments for improvements, and detailed corrections, which we have incorporated into our revised work.

      Weaknesses:

      The main conclusion remains clouded in ambiguity because of a poorly resolved Bayesian consensus and is consistent with work led by the lead author in 2019 (thus compromising the novelty of the findings). The Bayesian trees being majority rules consensus trees, optimising characters onto them (Figure 7b, d) is problematic. Optimising on a consensus tree can produce spurious optimisations that inflate tree length or distort other metrics of fit. Line 264 refers to at least three independent origins of cephalic sutures in artiopodans but the fully resolved Figure 7c requires only two origins. 

      We thank the reviewer for pointing this out. However now the analyses have been re-run we have new results to consider. The results still support multiple origins of sutures. We also note that the dots were indicating how terminals were coded. This is now clearer in the revised version of this figure (Figure 8 in the new version).

      We have extended our interrogation of the trees by incorporating treespace analyses. These add support for the nodes of interest (around the base of trilobites), showing that the coding of Acanthomeridion ventral plate homologies impacts its position in the tree, and thus has implications for our understanding of the evolution of sutures in trilobites.

      The question of how many times dorsal ecdysial sutures evolved in Artiopoda was addressed by Hou et al (2017), who first documented the facial sutures of Acanthomeridion and optimised them onto a phylogeny to infer multiple origins, as well as in a paper led by the lead author in Cladistics in 2019. Du et al. (2019) presented a phylogeny based on an earlier version of the current dataset wherein they discussed how many times sutures evolved or were lost based on their presence in

      Zhiwenia/Protosutura, Acanthomeridion, and Trilobita. To their credit, the authors acknowledge this (lines 62-65). The answer here is slightly different (because some topologies unite Acanthomeridion and trilobites). 

      The following points are not meant to be "Weaknesses" but rather are refinements: 

      I recommend changing the title of the paper from "cephalic sutures" to "dorsal ecdysial sutures" to be more precise about the character that is being tracked evolutionarily. Lots of arthropods have cephalic sutures (e.g., the ventral marginal suture of xiphosurans; the Y-shaped dorsomedian ecdysial line in insects). The text might also be updated to change other instances of "cephalic sutures" to a more precise wording. 

      We appreciate this point and have changed the title as suggested. 

      The authors have provided (but not explicitly identified) support values for nodes in their Bayesian trees but not in their parsimony ones. Please do the jackknife or bootstrap for the parsimony analyses and make it clear that the Bayesian values are posterior probabilities. 

      With the addition of further trilobite terminals to our parsimony analyses, the results became poor.

      Specifically the internal relationships of trilobites did not conform to any previous study, and Olenellus getzi was not resolved as an early diverging member of the group. This meant that these analyses could not be used for addressing the hypothesis of reviewer two. We decided to exclude reporting parsimony analysis results from this version to avoid confusion.

      We have added a note that the values reported at the nodes are posterior probabilities to figures S8, S9 and S10 where we show the full Bayesian results.

      In line 65 or somewhere else, it might be noted that a single origin of the dorsal facial sutures in trilobites has itself been called into question. Jell (2003) proposed that separate lineages of Eutrilobita evolved their facial sutures independently from separate sister groups within Olenellina. 

      We have added this to the introduction (Line 98). Thank you for raising this point.

      I have provided minor typographic or terminological corrections to the authors in a list of recommendations that may not be publicly available. 

      We appreciate the points made by the reviewer and their detailed corrections, which we have corrected in the revised version.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper the authors provide a characterisation of auditory responses (tones, noise, and amplitude modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristic with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group have previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised appears to be more responsive to more complex sounds (amplitude modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gaba'ergic modules in LC. However, while both LC and DC appears to have low frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice somatosensory inputs are capable of driving responses on its own in the modules of LC, but very little in the matrix. The authors now compare bimodal interactions under anaesthesia and awake states and find that effects are different in some cases under awake and anesthesia - particularly related to bimodal suppression and enhancement in the modules.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      The manuscript is improved by the response to reviewers. The authors have addressed my comments by adding new figures and panels, streamlining the analysis between awake and anaesthetised data (which has led to a more nuanced, and better supported conclusion), and adding more examples to better understand the underlying data. In streamlining the analyses between anaesthetised and awake data I would probably have opted for bringing these results into merged figures to avoid repetitiveness and aid comparison, but I acknowledge that that may be a matter of style. The added discussions of differences between awake and anaesthesia in the findings and the discussion of possible reasons why these differences are present help broaden the understanding of what the data looks like and how anaesthesia can affect these circuits.

      As mentioned in my previous review, the strength of this study is in its demonstration of using prism 2p imaging to image the lateral shell of IC to gain access to its neurochemically defined subdivisions, and they use this method to provide a basic description of the auditory and multisensory properties of lateral cortex IC subdivisions (and compare it to dorsal cortex of IC). The added analysis, information and figures provide a more convincing foundation for the descriptions and conclusions stated in the paper. The description of the basic functionality of the lateral cortex of the IC are useful for researchers interested in basic multisensory interactions and auditory processing and circuits. The paper provides a technical foundation for future studies (as the authors also mention), exploring how these neurochemically defined subdivisions receiving distinct descending projections from cortex contribute to auditory and multisensory based behaviour.

      Minor comment:

      - The authors have now added statistics and figures to support their claims about tonotopy in DC and LC. I asked for and I think allows readers to better understand the tonotopical organisation in these areas. One of the conclusions by the authors is that the quadratic fit is a better fit that a linear fit in DCIC. Given the new plots shown and previous studies this is likely true, though it is worth highlighting that adding parameters to a fitting procedure (as in the case when moving from linear to quadratic fit) will likely lead to a better fit due to the increased flexibility of the fitting procedure.

      Thank you for the suggestion. We have highlighted that the quadratic function allowed the regression model to include the cells tuned to higher frequencies at the rostromedial part of the DC and result in a better fit, which is consistent with the tonotopic organization that was previously described as shown in text at (lines 208-211).

      Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      A major achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons) and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it and the writing is not quite as precise as it could be.

      Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were overall more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was different in the awake prep, where modular neurons became more responsive to somatosensory stimuli. Thus, to this reviewer, one of the most intriguing results of the present study is the extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggests that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, the limitations of two-photon imaging for tracking neural activity are acknowledged, and appropriate statistical tests were used.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Increase font size of scale bars on figure 6.

      Thank you for the suggestion. We have increased the font size of the scale bar.

      Reviewer #2 (Recommendations For The Authors):

      Line 505: typo: 'didtinction'

      Thank you for the suggestion and we do apologize for the typo. We have fixed the word as shown in the text (line 506).

      No further comments.

      Reviewer #3 (Recommendations For The Authors):

      Line 543: Change "contripute" to "contribute"

      Thank you for the suggestion and we do apologize for the typo. We have fixed the word as shown in the text (line 544).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      The authors indicated that the adherence of ETEC is to intestinal epithelial cells. However, it is also possible that the majority of ETEC may reside in the intestinal mucus, particularly under in vivo infection condition. The colonization of ETEC in the jejunum and colon of piglets (Fig 2C) and in the intestines of mice (Fig S2A) does not necessarily reflect the adherence of ETEC to epithelial cells. Please verify these observations with other methods, such as immunostaining. Also, while Salmonella enterica serovar Typhimurium or Listeria monocytogenes can invade organoids within 1 hour, it is unknown if ETEC invade into organoids in this study. Clarifying this will help resolve if A. muciniphila block the adherence and/or invasion of ETEC. Please also address if A. muciniphila metabolites could prevent ETEC infection in the organoid models.

      In the original manuscript, the sentence “ETEC K88 adheres to intestinal epithelial cells and induces gut inflammation (Yu et al., 2018)” in line 447 is a reference cited for the purpose of connecting the previous and the following, and it is not our result. We have deleted this sentence on line 457. Previous studies have shown that ETEC enter into intestinal epithelial cells after only one hour of infection (Xiao et al., 2022; Qian et al., 2023). Whether A. muciniphila metabolites prevent ETEC infection in the organoid models is not the focus of this manuscript, it may be further explored by other members of the research group in the future.

      References:

      Xiao K, Yang Y, Zhang Y, Lv QQ, Huang FF, Wang D, Zhao JC, Liu YL. 2022. Long-chain PUFA ameliorate enterotoxigenic Escherichia coli-induced intestinal inflammation and cell injury by modulating pyroptosis and necroptosis signaling pathways in porcine intestinal epithelial cells. Br. J. Nutr. 128(5):835-850.

      Qian MQ, Zhou XC, Xu TT, Li M, Yang ZR, Han XY. 2023. Evaluation of Potential Probiotic Properties of Limosilactobacillus fermentum Derived from Piglet Feces and Influence on the Healthy and E. coli-Challenged Porcine Intestine. Microorganisms. 11(4).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      After revision, the bioinformatics section of the methods is still jumbled and may indicate issues in the pipeline. Important parameters are not included to replicate analyses. Merging the forward and reverse reads may represent a problem for denoising. Chimera detection was performed prior to denoising.

      Potential denoising issues for NovaSeq data was not addressed in the response. The authors did not clarify if multiple testing correction was applied; however, it may be assumed not as written. The raw sequencing data made available through the SRA accession (if for the correct project) indicates it was a MiSeq platform; however, the sample names do not appear to link up to this experimental design and metadata not sufficient to replicate analyses.

      We have redescribed the method for microbiome sequencing analysis on lines 298-327.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      SRA accession must be confirmed and metadata made available.

      We updated the SRA data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) In the first paragraph of the result section it is not clear why the authors introduce the function of p53ΔAS/ΔAS in thymocyte and then they mention fibroblasts. The authors should clarify this point. The authors should also explain based on what rationale they use doxorubicin and nutlin to analyze p53 activity (Figure 1 and figure S1). 

      We thank the reviewer for this comment. In the revised manuscript, we corrected this by mentioning, at the beginning of the Results section: “We analyzed cellular stress responses in thymocytes, known to undergo a p53-dependent apoptosis upon irradiation (Lowe et al., 1993), and in primary fibroblasts, known to undergo a p53-dependent cell cycle arrest in response to various stresses - e.g. DNA damage caused by irradiation or doxorubicin (Kastan et al., 1992), and the Nutlin-mediated inhibition of Mdm2, a negative regulator of p53 (Vassilev et al., 2004).”

      (2) The authors should provide quantification for the western blot in figure 2D because the reduction of p53 protein level in mutant vs wt tumors is not striking. 

      In the previous version of the manuscript, the quantification of p53 bands had been included, but quantification results were mentioned below the actin bands, rather than the p53 bands, and this was probably confusing. We have corrected this in the revised version of the manuscript. The quantification results are now provided just below the p53 bands in Figs. 1B and 2D, which should clarify this point. For Figure 2D, the quantifications show a strong decrease in p53 levels for 3 out of 4 analyzed mutant tumors. For consistency purposes, in the revised manuscript the quantification results also appear below Myc bands in Fig. 2C.

      (3) In the discussion section, the authors propose that a difference in Ackr4 expression may have prognostic value and that measuring ACKR4 gene expression in male patients with Burkitt lymphoma could be useful to identify the patients at higher risk. However the authors perform a lot of correlative analysis, both in mice and in patients, but the manuscript lacks of functional experiments that could help to functionally characterize Ackr4 and Mt2 in the etiology of B-cell lymphomas in males (both in mouse and in human models).

      In the previous version of the manuscript, we proposed that Ackr4 might act as a suppressor of B-cell lymphomagenesis by attenuating Myc signaling. This hypothesis relied on studies showing that Ackr4 impairs the Ccr7 signaling cascade, which may lead to decreased Myc activity (Ulvmar et al., 2014; Shi et al., 2015; Bastow et al., 2021) and that the loss of Ccr7 may delay Myc-driven lymphomagenesis (Rehm et al., 2011). Furthermore, we proposed that the increased expression of Mt2 in p53ΔAS/ΔAS Em-Myc male splenic cells reflected an increase in Myc activity, because Mt2 is known to be regulated by Myc (Qin et al., 2021) and because the Mt2 promoter is bound by Myc in B cells according to experiments reported in the ChIP-Atlas database. However, in the first version of the manuscript this hypothesis might have appeared only partially supported by our data because an increase in Myc activity could be expected to have a more general impact, i.e. an impact not only on the expression of Mt2, but also on the expression of many canonical Myc target genes. In the revised manuscript, we show that this is indeed the case. We performed a gene set enrichment analysis (GSEA) comparing the RNAseq data from p53ΔAS/ΔAS Eμ-Myc and p53+/+ Eμ-Myc male splenic cells and found an enrichment of hallmark Myc targets in p53ΔAS/ΔAS Eμ-Myc cells. These new data, which strengthen our hypothesis of differences in Myc signaling intensity, are presented in Fig. 3K and Table S2.

      Importantly, we now go beyond correlative analyses by providing direct experimental evidence that ACKR4 impacts on the behavior of Burkitt lymphoma cells. We used a CRISPR-Cas9 approach to knock-out ACKR4 in Raji Burkitt lymphoma cells and found that ACKR4 KO cells exhibited a 4-fold increase in chemokine-guided cell migration. These new data are presented in Figure 4F and the supplemental Figures S5-S7.  

      Finally, following a suggestion of Reviewer#2, we now also point out that “Ackr4 regulates B cell differentiation (Kara et al., 2018), which raises the possibility that an altered p53-Ackr4 pathway in p53ΔAS/ΔAS Eμ-Myc male splenic cells might contribute to increase the pools of pre-B and immature B cells that may be prone to lymphomagenesis.”

      In sum, we now mention in the Discussion that a decrease in Ackr4 expression might promote B-cell lymphomagenesis through three non-exclusive mechanisms.

      Reviewer #2 (Recommendations For The Authors): 

      (1) A great addition would be to demonstrate how p53AS specifically contributes to the regulation of Ackr4. In particular, is there evidence that p53AS might be preferentially recruited on p53 RE within that gene as compared to WT? The availability of specific antibodies that distinguish between AS and WT p53 might help to address this (experimentally complex) question. As a note, usage of such antibodies would also strengthen Fig 1B, in which the AS isoform appears as a mere faint shadow under p53, thus making its "disappearance" in trp53ΔAS/ΔAS difficult to evaluate. 

      We agree with the referee that efficient antibodies against p53-AS isoforms would have been useful. In fact, we tried a non-commercial antibody developed for that purpose, but it led to many unspecific bands in western blots and appeared not reliable. Importantly however, our luciferase assays clearly show that both p53-a and p53-AS can transactivate Ackr4, a result that might be expected because these isoforms share the same DNA binding domain. Furthermore, because p53-a isoforms appear more abundant than p53-AS isoforms at the protein and RNA levels (Figs. 1B and S1A), and because the loss of p53-AS isoforms leads to a significant decrease in p53-a protein levels (Figs. 1B and 2D), we think that in p53ΔAS/ΔAS cells the reduction in p53-a levels might be the main reason for a decreased transactivation of Ackr4. This is now more clearly discussed in the revised manuscript.

      (2) A most interesting observation is in Fig3 A and Fig S3, showing that spleen cells of p53ΔAS Eμ-Myc males (but not females) were enriched in pre-B and immature B cells as compared to WT counterparts. This observation points to a possible defect in B cell maturation process. It would be most interesting to determine whether this particular defect is directly mediated by a p53AS-Ackr4 axis. The hypothesis raised by the authors in the Discussion section is that increased Ackr4 expression may delay lymphomatogenesis, but data in Fig 3A and 3S actually suggest that ΔAS increases the pool of immature B-cell that may be prone to lymphomagenesis. 

      We thank the reviewer for this useful comment, which we integrated in the Discussion of the revised manuscript. Ackr4 was shown to regulate B cell differentiation (Kara at al. (2018) J Exp Med 215, 801–813), so this is indeed one of the possible mechanisms by which a deregulation of the p53-Ackr4 axis might promote lymphomagenesis. We now mention: “Ackr4 regulates B cell differentiation (Kara et al., 2018), which raises the possibility that an altered p53-Ackr4 pathway in p53ΔAS/ΔAS Eμ-Myc male splenic cells might contribute to increase the pools of pre-B and immature B cells that may be prone to lymphomagenesis.” This is presented as one of three possible mechanisms by which decreased Ackr4 levels may promote tumorigenesis, the two others being the impact of Ackr4 on the chemokine-guided migration of lymphoma cells and its apparent effect on Myc signalling.

      (3) The concordance with a male-specific prognostic effect of Ackr4 is most interesting in itself but is only of correlative evidence with respect to the study. Is there any information on whether p53AS expression is also a prognostic factor in BL? And is there evidence that Ackr4 may also be a male-specific prognostic factor in other B-cell malignancies, e.g. Multiple Myeloma?

      We have now performed the CRISPR-mediated knock-out of ACKR4 in Burkitt lymphoma cells and found that it leads to a dramatic increase in chemokine-guided cell migration, which goes beyond correlation. This significant new result is mentioned in the revised abstract and presented in detail in Figures 4F and S5-S7.

      Regarding p53-AS isoforms, they are murine-specific isoforms (Marcel et al. (2011) Cell Death Diff 18, 1815-1824), so there is no information on p53-AS expression in Burkitt lymphoma. Human p53 isoforms with alternative C-terminal domains are p53b and p53g isoforms, but the datasets we analyzed did not provide any information on the relative levels of p53a (the canonical isoform), p53b or p53g isoforms. We agree with the referee that this is an interesting question, but that cannot be answered with currently available datasets.

      Regarding the different types of B-cell malignancies, we had already shown that Ackr4 is a male-specific prognostic factor in Burkitt lymphomas but not in Diffuse Large B cell lymphomas, which indicated that it is not a prognostic factor in all types of B cell lymphomas. For this revision, we also searched for its potential prognostic value in multiple myeloma, and found that, as for DLBCL, it is not a prognostic factor in this cancer type. This new analysis is presented in Figure S4C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: This article explores the role of Ecdysone in regulating female sexual receptivity in Drosophila. The researchers found that PTTH, throughout its role as a positive regulator of ecdysone production, negatively affects the receptivity of adult virgin females. Indeed, loss of larval PTTH before metamorphosis significantly increases female receptivity right after adult eclosion and also later. However, during metamorphic neurodevelopment, Ecdysone, primarily through its receptor EcR-A, is required to properly develop the P1 neurons since its silencing led to morphological changes associated with a reduction in adult female receptivity. Nonetheless, the result shown in this manuscript sheds light on how Ecdysone plays a dual role in female adult receptivity, inhibiting it during larval development and enhancing it during metamorphic development. Unfortunately, this dual and opposite effect in two temporally different developmental stages has not been highlighted or explained. 

      Strengths: This paper exhibits multiple strengths in its approach, employing a well-structured experimental methodology that combines genetic manipulations, behavioral assays, and molecular analysis to explore the impact of Ecdysone on regulating virgin female receptivity in Drosophila. The study provides clear and substantial findings, highlighting that removing PTTH, a positive Ecdysone regulator, increases virgin female receptivity. Additionally, the research expands into the temporal necessity of PTTH and Ecdysone function during development. 

      Weaknesses: 

      There are two important caveats with the data that are reflecting a weakness: 

      (1) Contradictory Effects of Ecdysone and PTTH: One notable weakness in the data is the contrasting effects observed between Ecdysone and its positive regulator PTTH. PTTH loss of function increases female receptivity, while ecdysone loss of function reduces it. Given that PTTH positively regulates Ecdysone, one would expect that the loss of function of both would result in a similar phenotype or at least a consistent directional change. 

      A1. As newly formed prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al.,2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced increased EcR-A expression in the whole body of newly formed prepupae compared with PTTH -/+ flies. Because of the function of EcR-A in gene expression, this suggests that PTTH -/- disturbs the regulation of a serious of gene expressions during metamorphosis. However, it is not sure that the EcR-A expression in pC1 neurons is increased compared with genetic controls when PTTH is deleted. Furthermore, PTTH -/- must affect development of other neurons rather than only pC1 neurons. So, the feedforward relationship between PTTH and EcRA at the start of prepupal stage is one possible cause for the contradictory effects of PTTH -/- and EcR-A RNAi in pC1 neurons.  

      (2) Discordant Temporal Requirements for Ecdysone and PTTH: Another weakness lies in the different temporal requirements for Ecdysone and PTTH. The data from the manuscript suggest that PTTH is necessary during the larval stage, as shown in Figure 2 E-G, while Ecdysone is required during the pupal stage, as indicated in Figure 5 I-K. Ecdysone is a crucial developmental hormone with precisely regulated expression throughout development, exhibiting several peaks during both larval and pupal stages. PTTH is known to regulate Ecdysone during the larval stage, specifically by stimulating the kinetics of Ecdysone peaking at the wandering stage. However, it remains unclear whether pupal PTTH, expressed at higher levels during metamorphosis, can stimulate Ecdysone production during the pupal stage. Additionally, given the transient nature of the Ecdysone peak produced at wandering time, which disappears shortly before the end of the prepupal stage, it is challenging to infer that larval PTTH will regulate Ecdysone production during the pupal stage based on the current state of knowledge in the neuroendocrine field.  

      Considering these two caveats, the results suggest that the authors are witnessing distinct temporal and directional effects of Ecdysone on virgin female receptivity.  

      A2. First of all, it is necessary to clarify the detailed time for the manipulation of Ptth gene and PTTH neurons. In Figure 3, activation of PTTH neurons during the stage 2 inhibited the female receptivity. The “stage 2” is from six hours before the 3rd-instar larvae to the end of the wandering larvae (the start of prepupae). In Figure 5, The “pupal stage” is from the prepupal stage to the end of pupal stage. This “pupal stage” includes the forming of prepupae when the ecdysone peak is not disappeared. The time of manipulating Ptth and EcR-A in pC1 neurons are continuous. In addition, the pC1-Gal4 expressing neurons appear also at the start of prepupal stage. So, it is possible that PTTH regulates female receptivity through the function of EcR-A in pC1 neurons. 

      Reviewer #1 (Recommendations For The Authors): 

      In light of the significant caveat previously discussed, I will just make a few general suggestions: 

      (1) The paper primarily focuses on robust phenotypes, particularly in PTTH mutants, with a well-detailed execution of several experiments, resulting in thorough and robust outcomes. However, due to the caveat previously presented (opposite effect in larva and pupa), consider splitting the paper into two parts: Figures 1 to 4 deal with the negative effect of PTTH-Ecdysone on early virgin female receptivity, while Figures 5 to 7 focus on the positive metamorphic effect of Ecdysone in P1 metamorphic neurodevelopment. However, in this scenario, the mechanism by which PTTH loss of function increases female receptivity should be addressed.

      A3. It is a good suggestion that splitting the paper into two parts associated with the PTTH function and EcR function in pC1 neurons separately, if it is impossible that PTTH functions in female receptivity through the function of EcR-A in pC1 neurons. However, because of the feedforward relationship between PTTH and EcR-A in the newly formed prepupae, and the time of manipulating Ptth and EcR-A in pC1 neurons is continuous, it is possible that these two functions are not independent of each other. So, we still keep the initial edition.

      (2) Validate the PTTH mutants by examining homozygous mutant phenotypes and the dose-dependent heterozygous mutant phenotype using existing PTTH mutants. This could also be achieved using RNAi techniques.

      A4. We did not get other existing PTTH mutants. We instead decreased the PTTH expression in PTTH neurons and dsx+ neurons, but did not detect the similar phenotype to that of PTTH -/-. Similarly, the overexpression through PTTH-Gal4>UAS-PTTH is also not sufficient to change female receptivity. It is possible that both decreasing and increasing PTTH expression are not sufficient to change female receptivity.

      (3) Clarify if elav-Gal4 is not expressed in PTTH neurons and discuss how the rescue mechanisms work (hormonal, paracrine, etc.) in the text.

      A5. We tested the overlap of elav-Gal4>GFP signal and the stained PTTH with PTTH antibody. We did not detect the overlap. It suggests that elav-Gal4 is not expressed in PTTH neurons. However, we detected the expression of PTTH (PTTH antibody) in CNS when overexpressed PTTH using elav-Gal4>UASPTTH based on PTTH -/-. Furthermore, this rescued the phenotype of PTTH -/- in female receptivity. Insect PTTH isoforms have similar probable signal peptide for secreting. Indeed, except for the projection of axons to PG gland, PTTH also carries endocrine function acting on its receptor Torso in light sensors to regulate light avoidance of larvae. The overexpressed PTTH in other neurons through elav-Gal4>UASPTTH may act on the PG gland through endocrine function and then induce the ecdysone synthesis and release. So that, although elav-Gal4 is not expressed in PTTH neurons, the ecdysone synthesis triggered by PTTH from the hemolymph may result in the rescued PTTH -/- phenotype in female receptivity.

      (4) Consider renaming the new PTTH mutant to avoid confusion with the existing PTTHDelta allele. 

      A6. We have renamed our new PTTH mutant as PtthDelete.

      (5) Include the age of virgin females in each figure legend, especially for Figures 2 to 7, to aid in interpretation. This is essential information since wild-type early virgins -day 1- show no receptivity. In contrast, they reach a typical 80% receptivity later, and the mechanism regulating the first face might differ from the one occurring later.

      A7. We have included the age of virgin females in each figure legend. 

      (6) Explain the relevance of observing that PTTH adult neurons are dsx-positive, as it's unclear why this observation is significant, considering that these neurons are not responsible for the observed receptivity effect in virgin females. Alternatively, address this in the context of the third instar larva or clarify its relevance.  

      A8. We decreased the DsxF expression in PTTH neurons and did not detect significantly changed female receptivity. Almost all neurons regulating female receptivity, including pC1 neurons, express DsxF. We suppose that PTTH neurons have some relationship with other DsxF-positive neurons which regulate female receptivity. Indeed, we detected the overlap of dsx-LexA>LexAop-RFP and torso-Gal4>UAS-GFP during larval stage. Furthermore, decreasing Torso expression in pC1 neurons significantly inhibit female receptivity. 

      These results suggest that, PTTH regulates female receptivity not only through ecdysone, but also may through regulating other neurons especially DsxF-positive neurons associated with female receptivity directly. 

      Reviewer #2 (Public Review): 

      Summary: The authors tried to identify novel adult functions of the classical Drosophila juvenile-adult transition axis (i.e. ptth-ecdysone). Surprisingly, larval ptth-expressing neurons expressed the sex-specific doublesex gene, thus belonging to the sexual dimorphic circuit. Lack of ptth during late larval development caused enhanced female sexual receptivity, an effect rescued by supplying ecdysone in the food. Among many other cellular players, pC1 neurons control receptivity by encoding the mating status of females. Interestingly, during metamorphosis, a subtype of pC1 neurons required Ecdysone Receptor A in order to regulate such female receptivity. A transcriptomic analysis using pC1-specific Ecdyone signaling down-regulation gives some hints of possible downstream mechanisms. 

      Strengths: the manuscript showed solid genetic evidence that lack of ptth during development caused enhanced copulation rate in female flies, which includes ptth mutant rescue experiments by overexpressing ptth as well as by adding ecdysone-supplemented food. They also present elegant data dissecting the temporal requirements of ptth-expressing neurons by shifting animals from non-permissive to permissive temperatures, in order to inactivate neuronal function (although not exclusively ptth function). By combining different drivers together with a EcR-A RNAi line authors also identified the Ecdysone receptor requirements of a particular subtype of pC1 neurons during metamorphosis. Convincing live calcium imaging showed no apparent effect of EcR-A in neural activity, although some effect on morphology is uncovered. Finally, bulk RNAseq shows differential gene expression after EcR-A down-regulation. 

      Weaknesses: the paper has three main weaknesses. The first one refers to temporal requirements of ptth and ecdysone signaling. Whereas ptth is necessary during larval development, the ecdysone effect appears during pupal development. ptth induces ecdysone synthesis during larval development but there is no published evidence about a similar role for ptth during pupal stages. Furthermore, larval and pupal ecdysone functions are different (triggering metamorphosis vs tissue remodeling). The second caveat is the fact that ptth and ecdysone loss-of-function experiments render opposite effects (enhancing and decreasing copulation rates, respectively). The most plausible explanation is that both functions are independent of each other, also suggested by differential temporal requirements. Finally, in order to identify the effect in the transcriptional response of down-regulating EcR-A in a very small population of neurons, a scRNAseq study should have been performed instead of bulk RNAseq. 

      In summary, despite the authors providing convincing evidence that ptth and ecdysone signaling pathways are involved in female receptivity, the main claim that ptth regulates this process through ecdysone is not supported by results. More likely, they'd rather be independent processes. 

      B1. Clarification: in Figure 3, activation of PTTH neurons during the stage 2 inhibited the female receptivity. The “stage 2” is from six hours before the 3rd-instar larvae to the end of the wandering larvae (the start of prepupae). In Figure 5, The “pupal stage” is from the start of prepupal stage to the end of pupal stage. This “pupal stage” includes the forming of prepupae when the ecdysone peak is not disappeared. The time of manipulating Ptth and EcR-A in pC1 neurons are continuous. In addition, the pC1-Gal4 expressing neurons appear also at the start of prepupal stage. So, it is possible that PTTH regulates female receptivity through the function of EcR-A in pC1 neurons. 

      B2. During the forming of prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al.,2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced increased EcR-A compared with PTTH -/+ flies. Because of the function of EcR-A in gene expression, this suggests that PTTH -/- disturbs the regulation of a serious of gene expressions during metamorphosis. However, it is not sure that the EcR-A expression in pC1 neurons is increased compared with genetic controls when PTTH is deleted. Furthermore, PTTH -/- must affect the development of other neurons rather than only pC1 neurons. So, the feedforward relationship between PTTH and EcR-A at the start of prepupal stage is one possible cause for the contradictory effects of PTTH -/- and EcR-A RNAi in pC1 neurons.

      B3. We will do single cell sequencing in pC1 neurons for the exploration of detailed molecular mechanism of female receptivity in the future.

      Reviewer #2 (Recommendations For The Authors): 

      Additional experiments and suggestions: 

      - torso LOF in the PG to determine whether or not the ecdysone peak regulated by ptth (there is a 1-day delay in pupation) is responsible for the ptth effect in L3. In the same line, what happens if torso is downregulated in the pC1 neurons? Is there any effect on copulation rates? 

      B4. Because the loss of phm-Gal4, we could not test female receptivity when decreasing the expression of Torso in PG gland. However, decreasing Torso expression in pC1 neurons significantly inhibit female receptivity. This suggests that PTTH regulates female receptivity not only through ecdysone but also through regulating dsx+ pC1 neurons in female receptivity directly.

      - What is the effect of down-regulating ptth in the dsx+ neurons? No ptth RNAi experiments are shown in the paper. 

      B5. We decreased PTTH expression in dsx+ neurons but did not detect the change in female receptivity.  We also decreased PTTH expression in PTTH neurons using PTTH-Gal4, also did not detect the change in female receptivity. Similarly, the overexpression through PTTH-Gal4>UAS-PTTH is also not sufficient to change female receptivity. It is possible that both decreasing and increasing PTTH expression are not sufficient to change female receptivity.

      - Why are most copulation rate experiments performed between 4-6 days after eclosion? ptth LOF effect only lasts until day 3 after eclosion (but very weak-fig 1). Again, this supports the idea that ptth and ecdysone effects are unrelated.

      B6. Most behavioral experiments were performed between 4-6 days after eclosion as most other studies in flies, because the female receptivity reaches the peak at that time. Ptth LOF made female receptivity enhanced from the first day after eclosion. This seems like the precocious puberty. Wild type females reach high receptivity at 2 days after eclosion (about 75% within 10 min). We suppose that Ptth LOF effect only lasts until day 3 after eclosion because too high level of receptivity of control flies to exceed.

      It is not sure whether the effect of PTTH-/- in female receptivity disappears after the 3rd day of adult flies. So that it is not sure whether PTTH and EcR-A effects in pC1 neurons are unrelated.

      - The fact that pC1d neuronal morphology changes (and not pC1b) does not explain the effect of EcR-A LOF. Despite it is highlighted in the discussion, data do not support the hypothesis. How do these pC1 neurons look like in a ptth mutant animal regarding Calcium imaging and/or morphology? 

      B7. We detected the pattern of pC1 neurons when PTTH is deleted. Consistent with the feedforward relationship between PTTH and expression of EcR-A in newly formed prepupae, PTTH deletion induced less established pC1-d neurons contrary to that induced by EcR-A reduction in pC1 neurons. However, it is not sure that the expression of EcR-A in pC1 neurons is increased when PTTH is deleted. Furthermore, on the one hand, manipulation of PTTH has general effect on the neurodevelopment not only regulating pC1 neurons. On the other hand, the detailed pattern of pC1-b neurons which is the key subtype regulating female receptivity when EcR-A is decreased in pC1 neurons or PTTH is deleted could not be seen clearly. So, the abnormal development of pC1-b neurons, if this is true, is just one of the possible reasons for the effect of PTTH deletion on female receptivity.

      - The discussion is incomplete, especially the link between ptth and ecdysone; discuss why the phenotype is the opposite (ptth as a negative regulator of ecdysone in the pupa, for instance); the difference in size due to ptth LOF might be related to differential copulation rates.  

      B8. We have revised the discussion. We could not exclude the effect of size of body on female receptivity when PTTH was deleted or PTTH neurons were manipulated, although there was not enough evidence for the effect of body size on female receptivity.

      - scheme of pC neurons may help. 

      B9. We have tried to label pC1 neurons with GFP and sort pC1 neurons through flow cytometry sorting, but could not success. This may because the number of pC1 neurons is too low in one brain. We will try single-cell sequencing in the future. 

      - Immunofluorescence images are too small.

      B10. We have resized the small images.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript shows that mutations that disable the gene encoding the PTTH gene cause an increase in female receptivity (they mate more quickly), a phenotype that can be reversed by feeding these mutants the molting hormone, 20-hydoxyecdysone (20E). The use of an inducible system reveals that inhibition or activation of PTTH neurons during the larval stages increases and decreases female receptivity, respectively, suggesting that PTTH is required during the larval stages to affect the receptivity of the (adult) female fly. Showing that these neurons express the sex-determining gene dsx leads the authors to show that interfering with 20E actions in pC1 neurons, which are dsx-positive neurons known to regulate female receptivity, reduces female receptivity and increases the arborization pattern of pC1 neurons. The work concludes by showing that targeted knockdown of EcRA in pC1 neurons causes 527 genes to be differentially expressed in the brains of female flies, of which 123 passed a false discovery rate cutoff of 0.01; interestingly, the gene showing the greatest down-regulation was the gene encoding dopamine beta-monooxygenase. 

      Strengths 

      This is an interesting piece of work, which may shed light on the basis for the observation noted previously that flies lacking PTTH neurons show reproductive defects ("... females show reduced fecundity"; McBrayer, 2007; DOI 10.1016/j.devcel.2007.11.003). 

      Weaknesses: 

      There are some results whose interpretation seem ambiguous and findings whose causal relationship is implied but not demonstrated. 

      (1) At some level, the findings reported here are not at all surprising. Since 20E regulates the profound changes that occur in the central nervous system (CNS) during metamorphosis, it is not surprising that PTTH would play a role in this process. Although animals lacking PTTH (rather paradoxically) live to adulthood, they do show greatly extended larval instars and a corresponding great delay in the 20E rise that signals the start of metamorphosis. For this reason, concluding that PTTH plays a SPECIFIC role in regulating female receptivity seems a little misleading, since the metamorphic remodeling of the entire CNS is likely altered in PTTH mutants. Since these mutants produce overall normal (albeit larger--due to their prolonged larval stages) adults, these alterations are likely to be subtle. Courtship has been reported as one defect expressed by animals lacking PTTH neurons, but this behavior may stand out because reduced fertility and increased male-male courtship (McBrayer, 2007) would be noticeable defects to researchers handling these flies. By contrast, detecting defects in other behaviors (e.g., optomotor responses, learning and memory, sleep, etc) would require closer examination. For this reason, I would ask the authors to temper their statement that PTTH is SPECIFICALLY involved in regulating female receptivity.  

      C1. We agree with that, it is not surprising that PTTH regulates the profound changes that occur in the CNS during metamorphosis through ecdysone. Also, the behavioral changes induced by PTTH mutants include not only female receptivity. We will temper the statement about the function of PTTH on female receptivity.

      We think there are two new points in our text although more evidences are needed in the future. On the one hand, PTTH deletion and the reduction of EcR-A in pC1 neurons during metamorphosis have opposite effects on female receptivity. On the other hand, development of pC1-b neurons regulated by EcR-A during metamorphosis is important for female receptivity.

      (2) The link between PTTH and the role of pC1 neurons in regulating female receptivity is not clear. Again, since 20E controls the metamorphic changes that occur in the CNS, it is not surprising that 20E would regulate the arborization of pC1 neurons. And since these neurons have been implicated in female receptivity, it would therefore be expected that altering 20E signaling in pC1 neurons would affect this phenotype. However, this does not mean that the defects in female receptivity expressed by PTTH mutants are due to defects in pC1 arborization. For this, the authors would at least have to show that PTTH mutants show the changes in pC1 arborization shown in Fig. 6. And even then the most that could be said is that the changes observed in these neurons "may contribute" to the observed behavioral changes. Indeed, the changes observed in female receptivity may be caused by PTTH/20E actions on different neurons.

      C2. As newly formed prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al., 2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced upregulated EcR-A in the whole body of newly formed prepupae compared with PTTH -/+ flies. We also detected the pattern of pC1 neurons when PTTH is deleted. Consistent with the feedforward relationship between PTTH and expression of EcR-A in newly formed prepupae, PTTH deletion induced less established pC1-d neurons contrary to that induced by EcR-A reduction in pC1 neurons. 

      However, it is not sure that the expression of EcR-A in pC1 neurons increases compared with genetic controls when PTTH is deleted. Furthermore, on the one hand, manipulation of PTTH has general effect on the neurodevelopment. On the other hand, the detailed pattern of pC1-b neurons which is the key subtype regulating female receptivity through EcR-A function in pC1 neurons could not be seen clearly. So, the abnormal development of pC1b neurons, if this is true, is just one of the possible reasons for the effect of PTTH deletion on female receptivity.

      (3) Some of the results need commenting on, or refining, or revising:  a- For some assays PTTH behaves sometimes like a recessive gene and at other times like a semidominant, and yet at others like a dominant gene. For instance, in Fig. 1D-G, PTTH[-]/+ flies behave like wildtype (D), express an intermediate phenotype (E-F), or behave like the mutant (G). This may all be correct but merits some comment.

      C3. Female receptivity increases with the increase of age after eclosion, not only for wild type flies but also PTTH mutants. At the first day after eclosion (Figure 1D), maybe the loss of PTTH in PTTH[-]/+ flies is not enough for sexual precocity as in PTTH -/-. At the second day after eclosion and after (Figure 1E-G), the loss of PTTH in PTTH[-]/+ flies is sufficient to enhance female receptivity compared with wild type flies. However, After the 2nd day of adult, female receptivity of all genotype flies increases sharply. At the 3rd day of adult and after, female receptivity of PTTH -/- reaches the peak and the receptivity of PTTH[-]/+ reaches more nearly to PTTH -/- when flies get older.  

      b - Some of the conclusions are overstated. i) Although Fig. 2E-G does show that silencing the PTTH neurons during the larval stages affects copulation rate (E) the strength of the conclusion is tempered by the behavior of one of the controls (tub-Gal80[ts]/+, UAS-Kir2.1/+) in panels F and G, where it behaves essentially the same as the experimental group (and quite differently from the PTTH-Gal4/+ control; blue line).(Incidentally, the corresponding copulation latency should also be shown for these data.). ii) For Fig. 5I-K, the conclusion stated is that "Knock-down of EcR-A during pupal stage significantly decreased the copulation rate." Although strictly correct, the problem is that panel J is the only one for which the behavior of the control lacking the RNAi is not the same as that of the experimental group. Thus, it could just be that when the experiment was done at the pupal stage is the only situation when the controls were both different from the experimental. Again, the results shown in J are strictly speaking correct but the statement is too definitive given the behavior of one of the controls in panels I and K. Note also that panel F shows that the UAS-RNAi control causes a massive decrease in female fertility, yet no mention is made of this fact.

      C4. i) For all figures in the text, only when all the control groups were significant different from assay group, we say the assay group is significantly different. In Figure 2E-G, the control groups were both different from the assay group only at the larval stage. The difference between two control groups may due to the genetic background. We have described more detailed statistical analysis in the legend. In addition, the corresponding copulation latency has been shown. ii) For Figure 5, we have revised the conclusion in text as “when the experiment was done at the pupal stage is the only situation when the controls were both different from the experimental.” Besides, the UAS-RNAi control causes a massive decrease in female fertility in panel F has been mentioned.

      Reviewer #3 (Recommendations For The Authors): 

      (1) I am not sure that PTTH neurons should be referred to as "PG neurons". I am aware that this name has been used before but the PG is a gland that does not have neurons; it is not even innervated in all insects. 

      C5. Agree. “PG neurons” has been changed into “PTTH neurons”.

      (2) Fig. 1A warrants some explanation. One can easily imagine what it shows but a description is warranted. 

      C6. Explanation has been added.

      (3) When more than one genotype is compared it would be more useful to use letters to mark the genotypes that are not statistically different from each other rather than simply using asterisks. For instance, in the case of copulation latencies shown in Fig. 1E-G, which result does the comparison refer to? For example, since the comparisons are the result of ANOVAs, which comparison receives "*" in Fig. 1F? Is it PTTH[-]/+ vs PTTH[-]/PTTH[-] or vs. +/+? 

      C7. Referred genotypes and conditions were marked in all figure legends.

      (4) Fig. 1H: Why is copulation latency of PTTH[-]/PTTH[-]+elav-GAL4 significantly different from that of PTTH[-]/PTTH[-]? This merits a comment. Also, why was elav-GAL4 used to effect the rescue and not the PTTH-GAL4 driver? 

      C8. We could not explain this phenomenon. This may due to the different genetic backgrounds between controls. We have mentioned this in figure legend.

      (5) Fig. 2C, the genotype is written in a confusing order, GAL4+UAS should go together as should LexA+LexAop. 

      C9. We have revised for avoiding confusion.

      (6) In Fig. 2, is "larval stage" the same period that is shown in Fig. 3A? Please clarify.

      C10. We have clarified this in text and legends.

      (7) Fig. 6. The fact that pC1 neurons can be labeled using the pC1-ss2-Gal4 at the start of the pupal stage does not mean that this is when these neurons appear (are born), only when they start expressing this GAL4. Other types of evidence would be needed to make a statement about the birthdate of these neurons. 

      C11. We have revised the description for the appearance of pC1-ss2-Gal4>GFP. The detailed birth time of pC1 neurons will be tested in future.

      (8) The results shown in Fig. 7 are not pursued further and thus appear like a prelude to the next manuscript. Unless the authors have more to add regarding the role of one of the differentially expressed genes (e.g., dopamine beta-monooxygenase, which they single out) I would suggest leaving this result out. 

      C12. We have leave this out.

      (9) Female flies lacking PTTH neurons were reported to show lower fecundity by McBrayer et al. (2007) and should be cited. 

      C13. This important study has been cited in the first manuscript. In this revision, we have cited it again when mentioning the lower fecundity of female flies lacking PTTH neurons.

      (10) Line 230: when were PTTH neurons activated? Since they are dead by 10h post-eclosion it isn't clear if this experiment even makes sense. 

      C14. Yes, we did this for making sure that PTTH neurons do not affect female receptivity at adult stage again.

      (11) Line 338: the statements in the figures say that PTTH function is required during the larval stages, not during metamorphosis 

      C15. This has been revised as “The result suggested that EcR-A in pC1 neurons plays a role in virgin female receptivity during metamorphosis. This is consistent with that PTTH regulates virgin female receptivity before the start of metamorphosis.”

      (12) Did the authors notice any abnormal behavior in males? McBrayer et al. (2007) mention that males lacking PTTH neurons show male-male courtship. This may remit to the impact of 20E on other dsx[+] neurons. 

      C16. Yes, we have noticed that males lacking PTTH show male-male courtship. It is possible that PTTH deletion induces male-male courtship through the impact of 20E on other dsx+ or fru+ neurons. We have added the corresponding discussion.

      (13) Line 145: please define CCT at first use 

      C17. CCT has been defined.

      (14) Overall the manuscript is well written; however, it would still benefit from editing by a native English speaker. I have marked a few corrections that are needed, but I probably missed some. 

      + Line 77: "If female is not willing..." should say "If THE female is not willing..." 

      + Line 78 "...she may kick the legs, flick the wings," should say "...she may kick HER legs, flick HER wings," 

      + Lines 93-94 this sentence is unclear: "...while the neurons in that fru P1 promoter or dsx is expressed regulate some aspects..." 

      + Line 108 "...similar as the function of hypothalamic-pituitary-gonadal (HPG).." should say "...similar

      TO the function of hypothalamic-pituitary-gonadal (HPG).." 

      + Line 152 "Due to that 20E functions through its receptor EcR.." should say ""BECAUSE 20E ACTS through its receptor EcR.." 

      + Lines 155, 354 "unnormal" is not commonly used (although it is an English word); "abnormal" is usually used instead. 

      + Line 273: "....we then asked that whether ecdysone regulates" delete "that"  + Sentences lines 306-309 need to be revised.

      C18. Thank you for your suggestions. We have revised as you advise.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review): 

      The reviewer retained most of their comments from the previous reviewing round. In order to meet these comments and to further examine the dynamic nature of threat omission-related fMRI responses, we now re-analyzed our fMRI results using the single trial estimates. The results of these additional analyses are added below in our response to the recommendations for the authors of reviewer 1. However, we do want to reiterate that there was a factually incorrect statement concerning our design in the reviewer’s initial comments. Specifically, the reviewer wrote that “25% of shocks are omitted, regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, or 0%.” We want to repeat that this is not what we did. 100% trials were always reinforced (100% reinforcement rate); 0% trials were never reinforced (0% reinforcement rate). For all other instructed probability levels (25%, 50%, 75%), the stimulation was delivered in 25% of the trials (25% reinforcement rate). We have elaborated on this misconception in our previous letter and have added this information more explicitly in the previous revision of the manuscript (e.g., lines 125-129; 223-224; 486-492).   

      Reviewer #1 (Recommendations For The Authors): 

      I do not have any further recommendations, although I believe an analysis of learning-related changes is still possible with the trial-wise estimates from unreinforced trials. The authors' response does not clarify whether they tested for interactions with run, and thus the fact that there are main effects does not preclude learning. I kept my original comments regarding limitations, with the exception of the suggestion to modify the title. 

      We thank the reviewer for this recommendation. In line with their suggestion, we have now reanalyzed our main ROI results using the trial-by-trial estimates we obtained from the firstlevel omission>baseline contrasts. Specifically, we extracted beta-estimates from each ROI and entered them into the same Probability x Intensity x Run LMM we used for the relief and SCR analyses. Results from these analyses (in the full sample) were similar to our main results. For the VTA/SN model, we found main effects of Probability (F = 3.12, p = .04), and Intensity (F = 7.15, p < .001) (in the model where influential outliers were rescored to 2SD from mean). There was no main effect of Run (F = 0.92, p = .43) and no Probability x Run interaction (F = 1.24, p = .28). If the experienced contingency would have interfered with the instructions, there should have been a Probability x Run interaction (with the effect of Probability only being present in the first runs). Since we did not observe such an interaction, our results indicate that even though some learning might still have taken place, the main effect of Probability remained present throughout the task.  

      There is an important side note regarding these analyses: For the first level GLM estimation, we concatenated the functional runs and accounted for baseline differences between runs by adding run-specific intercepts as regressors of no-interest. Hence, any potential main effect of run was likely modeled out at first level. This might explain why, in contrast to the rating and SCR results (see Supplemental Figure 5), we found no main effect of Run. Nevertheless, interaction effects should not be affected by including these run-specific intercepts.

      Note that when we ran the single-trial analysis for the ventral putamen ROI, the effect of intensity became significant (F = 3.89, p = .02). Results neither changed for the NAc, nor the vmPFC ROIs.  

      Reviewer #2 (Public Review): 

      Comments on revised version: 

      I want to thank the authors for their thorough and comprehensive work in revising this manuscript. I agree with the authors that learning paradigms might not be a necessity when it comes to study the PE signals, but I don't particularly agree with some of the responses in the rebuttal letter ("Furthermore, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted."). This is of course correct description for the conditioning paradigm, but the same can be said for an instructed design: the aversive outcome was either delivered or not. That being said, adopting the instructed design itself is legitimate in my opinion. 

      We thank the reviewer for this comment. We have now modified the phrasing of this argument to clarify our reasoning (see lines 102-104: “First, these only included one level of aversive outcome: the electrical stimulation was either delivered at a fixed intensity, or omitted; but the intensity of the stimulation was never experimentally manipulated within the same task.”).  

      The reason why we mentioned that “the aversive outcome is either delivered or omitted” is because in most contemporary conditioning paradigms only one level of aversive US is used. In these cases, it is therefore not possible to investigate the effect of US Intensity. In our paradigm, we included multiple levels of aversive US, allowing us to assess how the level of aversiveness influences threat omission responding. It is indeed true that each level was delivered or not. However, our data clearly (and robustly across experiments, see Willems & Vervliet, 2021) demonstrate that the effects of the instructed and perceived unpleasantness of the US (as operationalized by the mean reported US unpleasantness during the task) on the reported relief and the omission fMRI responses are stronger than the effect of instructed probability.  

      My main concern, which the authors spent quite some length in the rebuttal letter to address, still remains about the validity for different instructed probabilities. Although subjects were told that the trials were independent, the big difference between 75% and 25% would more than likely confuse the subjects, especially given that most of us would fall prey to the Gambler's fallacy (or the law of small numbers) to some degree. When the instruction and subjective experience collides, some form of inference or learning must have occurred, making the otherwise straightforward analysis more complex. Therefore, I believe that a more rigorous/quantitative learning modeling work can dramatically improve the validity of the results. Of course, I also realize how much extra work is needed to append the computational part but without it there is always a theoretical loophole in the current experimental design. 

      We agree with the reviewer that some learning may have occurred in our task. However, we believe the most important question in relation to our study is: to what extent did this learning influence our manipulations of interest?  

      In our reply to reviewer 1, we already showed that a re-analysis of the fMRI results using the trial-by-trial estimates of the omission contrasts revealed no Probability x Run interaction, suggesting that – overall – the probability effect remained stable over the course of the experiment. However, inspired by the alternative explanation that was proposed by this reviewer, we now also assessed the role of the Gambler’s fallacy in a separate set of analyses. Indeed, it is possible that participants start to expect a stimulation more after more time has passed since the last stimulation was experienced. To test this alternative hypothesis, we specified two new regressors that calculated for each trial of each participant how many trials had passed since the last stimulation (or since the beginning of the experiment) either overall (across all trials of all probability types; hence called the overall-lag regressor) or per probability level (across trials of each probability type separately; hence called the lag-per-probability regressor). For both regressors a value of 0 indicates that the previous trial was either a stimulation trial or the start of experiment, a value of 1 means that the last stimulation trial was 2 trials ago, etc.  

      The results of these additional analyses are added in a supplemental note (see supplemental note 6), and referred to in the main text (see lines 231-236: “Likewise, a post-hoc trial-by-trial analysis of the omission-related fMRI activations confirmed that the Probability effect for the VTA/SN activations was stable over the course of the experiment (no Probability x Run interaction) and remained present when accounting for the Gambler’s fallacy (i.e., the possibility that participants start to expect a stimulation more when more time has passed since the last stimulation was experienced) (see supplemental note 6). Overall, these post-hoc analyses further confirm the PE-profile of omission-related VTA/SN responses”.  

      Addition to supplemental material (pages 16-18)

      Supplemental Note 6: The effect of Run and the Gambler’s Fallacy 

      A question that was raised by the reviewers was whether omission-related responses could be influenced by dynamical learning or the Gambler’s Fallacy, which might have affected the effectiveness of the Probability manipulation.  

      Inspired by this question, we exploratorily assessed the role of the Gambler’s Fallacy and the effects of Run in a separate set of analyses. Indeed, it is possible that participants start to expect a stimulation more when more time has passed since the last stimulation was experienced. To test this alternative hypothesis, we specified two new regressors that calculated for each trial of each participant how many trials had passed since the last stimulation (or since the beginning of the experiment) either overall (across all trials of all probability types; hence called the overall-lag regressor) or per probability level (across trials of each probability type separately; hence called the lag-per-probability regressor). For both regressors a value of 0 indicates that the previous trial was either a stimulation trial or the start of experiment, a value of 1 means that the last stimulation trial was 2 trials ago, etc.  

      The new models including these regressors for each omission response type (i.e., omission-related activations for each ROI, relief, and omission-SCR) were specified as follows:   

      (1) For the overall lag:

      Omission response ~ Probability * Intensity * Run + US-unpleasantness + Overall-lag + (1|Subject).  

      (2) For the lag per probability level:

      Omission response ~ Probability * Intensity * Run + US-unpleasantness + Lag-perprobability : Probability + (1|Subject).  

      Where US-unpleasantness scores were mean-centered across participants; “*” represents main effects and interactions, and “:” represents an interaction (without main effect). Note that we only included an interaction for the lag-per-probability model to estimate separate lag-parameters for each probability level.  

      The results of these analyses are presented in the tables below. Overall, we found that adding these lag-regressors to the model did not alter our main results. That is: for the VTA/SN, relief and omission-SCR, the main effects of Probability and Intensity remained. Interestingly, the overall-lag-effect itself was significant for VTA/SN activations and omission SCR, indicating that VTA/SN activations were larger when more time had passed since the last stimulation (beta = 0.19), whereas SCR were smaller when more time had passed (beta = -0.03). This pattern is reminiscent of the Perruchet effect, namely that the explicit expectancy of a US increases over a run of non-reinforced trials (in line with the gambler’s fallacy effect) whereas the conditioned physiological response to the conditional stimulus declines (in line with an extinction effect, Perruchet, 1985; McAndrew, Jones, McLaren, & McLaren, 2012). Thus, the observed dissociation between the VTA/SN activations and omission SCR might similarly point to two distinctive processes where VTA/SN activations are more dependent on a consciously controlled process that is subjected to the gambler’s fallacy, whereas the strength of the omission SCR responses is more dependent on an automatic associative process that is subjected to extinction. Importantly, however, even though the temporal distance to the last stimulation had these opposing effects on VTA/SN activations and omission SCRs, the main effects of the probability manipulation remained significant for both outcome variables. This means that the core results of our study still hold.   

      Next to the overall-lag effect, the lag-per-probability regressor was only significant for the vmPFC. A follow-up of the beta estimates of the lag-per-probability regressors for each probability level revealed that vmPFC activations increased with increasing temporal distance from the stimulation, but only for the 50% trials (beta = 0.47, t = 2.75, p < .01), and not the 25% (beta = 0.25, t = 1.49, p = .14) or the 75% trials (beta = 0.28, t = 1.62, p = .10).

      Author response table 1.

      F-statistics and corresponding p-values from the overall lag model

      (*) F-test and p-values were based on the model where outliers were rescored to 2SD from the mean. Note that when retaining the influential outliers for this model, the p-value of the probability effect was p = .06. For all other outcome variables, rescoring the outliers did not change the results. Significant effects are indicated in bold.

      Author response table 2.

      Table 2 F-statistics and corresponding p-values from the lag per probability level model

      (*) F-test and p-values were based on the model where outliers were rescored to 2SD from the mean. Note that when retaining the influential outliers for this model, the p-value of the Intensity x Run interaction was p = .05. For all other outcome variables, rescoring the outliers did not change the results. Significant effects are indicated in bold.

      As the authors mentioned in the rebuttal letter, "selecting participants only if their anticipatory SCR monotonically increased with each increase in instructed probability 0% < 25% < 50% < 75% < 100%, N = 11 participants", only ~1/3 of the subjects actually showed strong evidence for the validity of the instructions. This further raises the question of whether the instructed design, due to the interference of false instruction and the dynamic learning among trials, is solid enough to test the hypothesis .  

      We agree with the reviewer that a monotonic increase in anticipatory SCR with increasing probability instructions would provide the strongest evidence that the manipulation worked. However, it is well known that SCR is a noisy measure, and so the chances to see this monotonic increase are rather small, even if the underlying threat anticipation increases monotonically. Furthermore, between-subject variation is substantial in physiological measures, and it is not uncommon to observe, e.g., differential fear conditioning in one measure, but not in another (Lonsdorf & Merz, 2017). It is therefore not so surprising that ‘only’ 1/3 of our participants showed the perfect pattern of monotonically increasing SCR with increasing probability instructions. That being said, it is also important to note that not all participants were considered for these follow-up analyses because valid SCR data was not always available.

      Specifically, N = 4 participants were identified as anticipation non-responders (i.e. participant with smaller average SCR to the clock on 100% than on 0% trials; pre-registered criterium) and were excluded from the SCR-related analyses, and N = 1 participant had missing data due to technical difficulties. This means that only 26 (and not 31) participants were considered for the post hoc analyses. Taking this information into account, this means that 21 out of 26 participants (approximately 80%) showed stronger anticipatory SCR following 75% instructions compared to 25% instructions and that  11 out of 26 participants (approximately 40%) even showed the monotonical increase in their anticipatory SCR (see supplemental figure 4). Furthermore, although anticipatory SCR gradually decreased over the course of the experiment, there was no Run x Probability interaction, indicating that the instructions remained stable throughout the task (see supplemental figure 3).  

      Reviewer #2 (Recommendations For The Authors):

      A more operational approach might be to break the trials into different sections along the timeline and examine how much the results might have been affected across time. I expect the manipulation checks would hold for the first one or two runs and the authors then would have good reasons to focus on the behavioral and imaging results for those runs. 

      This recommendation resembles the recommendation by reviewer 1. In our reply to reviewer 1, we showed the results of a re-analysis of the fMRI data using the trial-by-trial estimates of the omission contrasts, which revealed no Probability x Run interaction, suggesting that – overall - the probability effect remained (more or less) stable over the course of the experiment.  For a more in depth discussion of the results of this additional analysis, we refer to our answer to reviewer 1.  

      Reviewer #3 (Public Review): 

      Comments on revised version: 

      The authors were extremely responsive to the comments and provided a comprehensive rebuttal letter with a lot of detail to address the comments. The authors clarified their methodology, and rationale for their task design, which required some more explanation (at least for me) to understand. Some of the design elements were not clear to me in the original paper. 

      The initial framing for their study is still in the domain of learning. The paper starts off with a description of extinction as the prime example of when threat is omitted. This could lead a reader to think the paper would speak to the role of prediction errors in extinction learning processes. But this is not their goal, as they emphasize repeatedly in their rebuttal letter. The revision also now details how using a conditioning/extinction framework doesn't suit their experimental needs. 

      We thank the reviewer for pointing out this potential cause of confusion. We have now rewritten the starting paragraph of the introduction to more closely focus on prediction errors, and only discuss fear extinction as a potential paradigm that has been used to study the role of threat omission PE for fear extinction learning (see lines 40-55). We hope that these adaptations are sufficient to prevent any false expectations. However, as we have mentioned in our previous response letter, not talking about fear extinction at all would also not make sense in our opinion, since most of the knowledge we have gained about threat omission prediction errors to date is based on studies that employed these paradigms.  

      Adaptation in the revised manuscript (lines 40-55):  

      “We experience pleasurable relief when an expected threat stays away1. This relief indicates that the outcome we experienced (“nothing”) was better than we expected it to be (“threat”). Such a mismatch between expectation and outcome is generally regarded as the trigger for new learning, and is typically formalized as the prediction error (PE) that determines how much there can be learned in any given situation2. Over the last two decades, the PE elicited by the absence of expected threat (threat omission PE) has received increasing scientific interest, because it is thought to play a central role in learning of safety. Impaired safety learning is one of the core features of clinical anxiety4. A better understanding of how the threat omission PE is processed in the brain may therefore be key to optimizing therapeutic efforts to boost safety learning. Yet, despite its theoretical and clinical importance, research on how the threat omission PE is computed in the brain is only emerging.  

      To date, the threat omission PE has mainly been studied using fear extinction paradigms that mimic safety learning by repeatedly confronting a human or animal with a threat predicting cue (conditional stimulus, CS; e.g. a tone) in the absence of a previously associated aversive event (unconditional stimulus, US; e.g., an electrical stimulation). These (primarily non-human) studies have revealed that there are striking similarities between the PE elicited by unexpected threat omission and the PE elicited by unexpected reward.”

      It is reasonable to develop a new task to answer their experimental questions. By no means is there a requirement to use a conditioning/extinction paradigm to address their questions. As they say, "it is not necessary to adopt a learning paradigm to study omission responses", which I agree with.  But the authors seem to want to have it both ways: they frame their paper around how important prediction errors are to extinction processes, but then go out of their way to say how they can't test their hypotheses with a learning paradigm.

      Part of their argument that they needed to develop their own task "outside of a learning context" goes as follows: 

      (1) "...conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, the magnitude-related axiom cannot be tested." 

      (2) "....in conditioning tasks people generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intra-individual variability in the PE responses" 

      (3) "...because of the relatively low signal to noise ratio in fMRI measures, fear extinction studies often pool across trials to compare omission-related activity between early and late extinction, which further reduces the necessary variability to properly evaluate the probability axiom" 

      These points seem to hinge on how tasks are "generally" constructed. However, there are many adaptations to learning tasks:

      (1) There is no rule that conditioning can't include different levels of aversive outcomes following different cues. In fact, their own design uses multiple cues that signal different intensities and probabilities. Saying that conditioning "generally only include one level of aversive outcome" is not an explanation for why "these paradigms are not tailored" for their research purposes. There are also several conditioning studies that have used different cues to signal different outcome probabilities. This is not uncommon, and in fact is what they use in their study, only with an instruction rather than through learning through experience, per se.

      (2) Conditioning/extinction doesn't have to occur fast. Just because people "generally learn fast" doesn't mean this has to be the case. Experiments can be designed to make learning more challenging or take longer (e.g., partial reinforcement). And there can be intra-individual differences in conditioning and extinction, especially if some cues have a lower probability of predicting the US than others. Again, because most conditioning tasks are usually constructed in a fairly simplistic manner doesn't negate the utility of learning paradigms to address PEaxioms.

      (3) Many studies have tracked trial-by-trial BOLD signal in learning studies (e.g., using parametric modulation). Again, just because other studies "often pool across trials" is not an explanation for these paradigms being ill-suited to study prediction errors. Indeed, most computational models used in fMRI are predicated on analyzing data at the trial level. 

      We thank the reviewer for these remarks. The “fear conditioning and extinction paradigms” that we were referring to in this paragraph were the ones that have been used to study threat omission PE responses in previous research (e.g., Raczka et al., 2011; Thiele et al. 2021; Lange et al. 2020; Esser et al., 2021; Papalini et al., 2021; Vervliet et al. 2017). These studies have mainly used differential/multiple-cue protocols where either one (or two) CS+  and one CS- are trained in an acquisition phase and extinguished in the next phase. Thus, in these paradigms: (1) only one level of aversive US is used; and (2) as safety learning develops over the course of extinction, there are relatively few omission trials during which “large” threat omission PEs can be observed (e.g. from the 24 CS+ trials that were used during extinction in Esser et al., the steepest decreases in expectancy – and thus the largest PE – were found in first 6 trials); and (3) there was never absolute certainty that the stimulation will no longer follow. Some of these studies have indeed estimated the threat omission PE during the extinction phase based on learning models, and have entered these estimates as parametric modulators to CS-offset regressors. This is very informative. However, the exact model that was used differed per study (e.g. Rescorla-Wagner in Raczka et al. and Thiele et al.; or a Rescorla- Wagner–Pearce- Hall hybrid model in Esser et al.). We wanted to analyze threat omission-responses without commitment to a particular learning model. Thus, in order to examine how threat omissionresponses vary as a function of probability-related expectations, a paradigm that has multiple probability levels is recommended (e.g. Rutledge et al., 2010; Ojala et al., 2022)

      The reviewer rightfully pointed out that conditioning paradigms (more generally) can be tailored to fit our purposes as well. Still, when doing so, the same adaptations as we outlined above need to be considered: i.e. include different levels of US intensity; different levels of probability; and conditions with full certainty about the US (non)occurrence. In our attempt to keep the experimental design as simple and straightforward as possible, we decided to rely on instructions for this purpose, rather than to train 3 (US levels) x 5 (reinforcement levels) = 15 different CSs. It is certainly possible to train multiple CSs of varying reinforcement rates (e.g. Grings et al. 1971, Ojala et al., 2022). However, given that US-expectation on each trial would primarily depend on the individual learning processes of the participants, using a conditioning task would make it more difficult to maintain experimental control over the level of USexpectation elicited by each CS. As a result, this would likely require more extensive training, and thus prolong the study procedure considerably. Furthermore, even though previous studies have trained different CSs for different reinforcement rates, most of these studies have only used one level of US. Thus, in order to not complexify our task to much, we decided to rely on instructions rather than to train CSs for multiple US levels (in addition to multiple reinforcement rates).

      We have tried to clarify our reasoning in the revised version of the manuscript (see introduction, lines 100-113):  

      “The previously discussed fear conditioning and extinction studies have been invaluable for clarifying the role of the threat omission PE within a learning context. However, these studies were not tailored to create the varying intensity and probability-related conditions that are required to systematically evaluate the threat omission PE in the light of the PE axioms. First, these only included one level of aversive outcome: the electrical stimulation was either delivered or omitted; but the intensity of the stimulation was never experimentally manipulated within the same task. As a result, the magnitude-related axiom could not be tested. Second, as safety learning progressively developed over the course of extinction learning, the most informative trials to evaluate the probability axiom (i.e. the trials with the largest PE) were restricted to the first few CS+ offsets of the extinction phase, and the exact number of these informative trials likely differed across participants as a result of individually varying learning rates. This limited the experimental control and necessary variability to systematically evaluate the probability axiom. Third, because CS-US contingencies changed over the course of the task (e.g. from acquisition to extinction), there was never complete certainty about whether the US would (not) follow. This precluded a direct comparison of fully predicted outcomes. Finally, within a learning context, it remains unclear whether brain responses to the threat omission are in fact responses to the violation of expectancy itself, or whether they are the result of subsequent expectancy updating.”

      Again, the authors are free to develop their own task design that they think is best suited to address their experimental questions. For instance, if they truly believe that omission-related responses should be studied independent of updating. The question I'm still left puzzling is why the paper is so strongly framed around extinction (the word appears several times in the main body of the paper), which is a learning process, and yet the authors go out of their way to say that they can only test their hypotheses outside of a learning paradigm. 

      As we have mentioned before, the reason why we refer to extinction studies is because most evidence on threat omission PE to date comes from fear extinction paradigms.  

      The authors did address other areas of concern, to varying extents. Some of these issues were somewhat glossed over in the rebuttal letter by noting them as limitations. For example, the issue with comparing 100% stimulation to 0% stimulation, when the shock contaminates the fMRI signal. This was noted as a limitation that should be addressed in future studies, bypassing the critical point. 

      It is unclear to us what the reviewer means with “bypassing the critical point”. We argued in the manuscript that the contrast we initially specified and preregistered to study axiom 3 (fully predicted outcomes elicit equivalent activation) could not be used for this purpose, as it was confounded by the delivery of the stimulation. Because 100% trials aways included the stimulation and 0% trials never included stimulation, there was no way to disentangle activations related to full predictability from activations related to the stimulation as such.   

      Reviewer #3 (Recommendations For The Authors): 

      I'm not sure the new paragraph explaining why they can't use a learning task to test their hypotheses is very convincing, as I noted in my review. Again, it is not a problem to develop a new task to address their questions. They can justify why they want to use their task without describing (incorrectly in my opinion) that other tasks "generally" are constructed in a way that doesn't suit their needs. 

      For an overview of the changes we made in response to this recommendation, we refer to our reply to the public review.   

      We look forward to your reply and are happy to provide answers to any further questions or comments you may have.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The manuscript lacks the conclusion section to summarize their finding. The rebuttal is too simple to state where and in which way the authors have made their revisions. In this case, please return this revision to the authors and ask them revise their contribution carefully.

      We now indicate in detail the places and the way that we make revisions. Specific revisions in sentences/words are marked with blue color in the main text where necessary. A conclusion is now provided at the end of the main text (lines 264-275). Other major revisions include:

      (1) We add Fig. 5 as a new figure to reconstruct ovule structure of Alasemenia and to compare three- and four-winged ovules. This is followed by Fig. 6 relating to mathematical analysis.

      (2) We re-organize (sequences of some) paragraphs and revise sentences in Discussion, and then divide Discussion into three parts: “Late Devonian acupulate ovules and their functions” (lines 124-150), “Late Devonian winged ovules and evolution of ovular wings” (lines 151-179), “Mathematical analysis of wind dispersal of ovules with 1-4 wings” (lines 180-262).

      (3) We move “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section from the supplementary information to the main text as the third part of Discussion (lines 180-262). The original paragraph headed with Mathematical analysis in Results is now modified and inserted to “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 250-256). The last paragraph in the original Supplementary information is now greatly modified and presented at the end of “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 256-262).

      (4) With moving “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section from the supplementary information to the main text, five references are accordingly added to the list (lines 278-282, 296-300, 329-330).

      (5) We change the format of citing references in the main text.

      We have therefore returned your manuscript to you to allow you to make the updates necessary to address the editors comments. Please ensure that you also update your preprint with the newly revised version once complete.

      Many thanks for this allowance and we now make the necessary updates to address the editors’ and reviewers’ comments. At the same time, the new version is also provided as a preprint.

      Reviewer #1 (Public Review):

      Summary:

      Winged seeds or ovules from the Devonian are crucial to understanding the origin and early evolutionary history of wind dispersal strategy. Based on exceptionally well-preserved fossil specimens, the present manuscript documented a new fossil plant taxon (new genus and new species) from the Famennian Series of Upper Devonian in eastern China and demonstrated that three-winged seeds are more adapted to wind dispersal than one-, two- and four-winged seeds by using mathematical analysis.

      Many thanks for these positive comments by the reviewer.

      Strengths:

      The manuscript is well organised and well presented, with superb illustrations. The methods used in the manuscript are appropriate.

      Many thanks for the reviewer’s positive comments.

      Weaknesses:

      I would only like to suggest moving the "Mathematical analysis of wind dispersal of ovules with 1-4 wings" section from the supplementary information to the main text, leaving the supplementary figures as supplementary materials.

      Ok, following the suggestion, we have moved this “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section to the main text (lines 180-262). It now represents the third part of Discussion. The original paragraph headed with Mathematical analysis in Results is now modified and inserted to “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 250-256). The last paragraph in the original Supplementary information is now greatly modified and presented at the end of “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 256-262).

      Reviewer #2 (Public Review):

      Summary:

      This manuscript described the second earliest known winged ovule without a capule in the Famennian of Late Devonian. Using Mathematical analysis, the authors suggest that the integuments of the earliest ovules without a cupule, as in the new taxon and Guazia, evolved functions in wind dispersal.

      Yes, these include our description, mathematical analysis and suggestion.

      Strengths:

      The new ovule taxon's morphological part is convincing. It provides additional evidence for the earliest winged ovules, and the mathematical analysis helps to understand their function.

      Many thanks for these positive comments of the reviewer.

      Weaknesses:

      The discussion should be enhanced to clarify the significance of this finding. What is the new advance compared with the Guazia finding? The authors can illustrate the character transformations using a simplified cladogram. The present version of the main text looks flat.

      To clarify the significance of this finding, the discussion is now enhanced in the following respects. We now re-organize the contents of Discussion and divide it into three parts. These three parts are entitled “Late Devonian acupulate ovules and their functions” (lines 124-150), “Late Devonian winged ovules and evolution of ovular wings” (lines 151-179), “Mathematical analysis of wind dispersal of ovules with 1-4 wings” (lines 180-262). The third part is transformed from the original Supplementary information.

      Regarding new advance (Alasemenia) compared with Guazia and illustration of the character transformations:

      (1) we now provide a new figure (Fig. 5) to reconstruct ovule of Alasemenia and to compare the structure of these two ovules.

      (2) in the second part of Discussion, we now say “As in Alasemenia (Fig. 5a), the integumentary wings of acupulate ovule of Guazia are broad, thin and fold inwards along the abaxial side, but their numbers are four in each ovule and their free portions usually arch centripetally (Fig. 5c; Wang et al., 2022, Figure 5).”

      (3) also in the second part of Discussion, we now say “Compared to Warsteinia with short and straight wings and Guazia with long but distally inwards curving wings, Alasemenia with longer and outwards extending wings would efficiently reduce the rate of descent and be more capably moved by wind. Furthermore, the quantitative analysis in mathematics indicates that three-winged ovules such as Alasemenia are more adapted to wind dispersal than four-winged ovules including Warsteinia and Guazia (see following).”

      (4) in the third part of Discussion, we now say “Significantly, the maximum windward area of each wing of Alasemenia is greater than that of Guazia and Warsteinia with four wings. All these factors suggest that Alasemenia is well adapted for anemochory.”

      (5) in Conclusion, we now say “Compared to Famennian four-winged ovules of Warsteinia and Guazia, Alasemenia with three distally outwards extending wings shows advantage in anemochory.”

      Recommendations for the authors:

      Ok, we undertake some revisions and keep some original contents.

      Reviewer #1 (Recommendations For The Authors):

      I would only like to suggest moving the "Mathematical analysis of wind dispersal of ovules with 1-4 wings" section from the supplementary information to the main text, leaving the supplementary figures as supplementary materials.

      Ok, following the suggestion, we now move this “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section to the main text (lines 180-262). It now represents the third part of Discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) The mathematical part as the supplement can be incorporated into the text.

      Ok, following the suggestion, we now move this “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section to the main text (lines 180-262). It now represents the third part of Discussion. The original paragraph headed with Mathematical analysis in Results is now modified and inserted to “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 250-256). The last paragraph in the original Supplementary information is now greatly modified and presented at the end of “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 256-262).

      (2) The comparisons between three- or four-winged ovules are not addressed enough.

      We now add Fig. 5 as a new figure. Based on this figure and revisions, the comparisons between three- and four-winged ovules now include:

      a) “Their integumentary wings illustrate diversity in number (three or four per ovule), length, folding or flattening, and being straight or curving distally. As in Alasemenia (Fig. 5a), the integumentary wings of acupulate ovule of Guazia are broad, thin and fold inwards along the abaxial side, but their numbers are four in each ovule and their free portions usually arch centripetally (Fig. 5c; Wang et al., 2022, Figure 5). In contrast to Alasemenia, Warsteinia has four integumentary wings without folding and their free portions are short and straight (Rowe, 1997, TEXT-FIG. 4).” (lines 154-160).

      b) “Furthermore, the quantitative analysis in mathematics indicates that three-winged ovules such as Alasemenia are more adapted to wind dispersal than four-winged ovules including Warsteinia and Guazia (see following).” (lines 166-168).

      c) “The relative wind dispersal efficiency of three-winged seeds is obviously better than that of single- and two- winged seeds, and is close to that of four-winged seeds (Fig. 6). In addition, three-winged seeds have the most stable area of windward, which also ensures the motion stability in wind dispersal. Significantly, the maximum windward area of each wing of Alasemenia is greater than that of Guazia and Warsteinia with four wings.” (lines 256-261).

      d) “Compared to Famennian four-winged ovules of Warsteinia and Guazia, Alasemenia with three distally outwards extending wings shows advantage in anemochory.” (lines 272-274).

      (3) The significance of this finding should be well summarized with solid evidence.

      It has been summarized in Abstract (lines 19-28) and is now further summarized especially in the newly provided Conclusion (lines 264-275).

    1. Author response:

      Reviewer #1

      - The entire study is based on only 2 adult animals, that were used for both the single cell dataset and the HCR. Additionally, the animals were caught from the ocean preventing information about their age or their life history. This makes the n extremely small and reduces the confidence of the conclusions. 

      This statement is incorrect.  While the scRNAseq was indeed performed in two animals (n=2), the HCR-FISH was performed in 3-5 animals (depending on the probe used).  These were different animals from those used for the scRNAseq.  We are partly responsible for this confusion, since we did not state the number of animals used for the HSC-FISH in the manuscript. 

      - All the fluorescent pictures present in this manuscript present red nuclei and green signals being not color-blind friendly. Additionally, many of the images lack sufficient quality to determine if the signal is real. Additional images of a control animal (not eviscerated) and of a negative control would help data interpretation. Finally, in many occasions a zoomed out image would help the reader to provide context and have a better understanding of where the signal is localized. 

      Fluorescent photos will be changed to color-blind friendly colors. 

      Diagrams, arrows and new photos will be included as to guide readers to the signal

      or labeling in cells. In the original manuscript 6 out of 7 cluster validations included a photo of a normal, non-eviscerated control.  We will make certain that this is highlighted in the resubmission and that ALL figures with HCR-FISH labeling will include data from control animals.

      - The Authors frequently report the percentage of cells with a specific feature (either labelled or expressing a certain gene or belonging to a certain cluster). This number can be misleading since that is calculated after cell dissociation and additional procedures (such as staining or sequencing and dataset cleanup) that can heavily bias the ratio between cell types. Similarly, the Authors cannot compare cell percentage between anlage and mesentery samples since that can be affected by technical aspects related to cell dissociation, tissue composition and sequencing depth. 

      The Reviewer has correctly identified the limitations of using cell percentages in scRNA-seq analyses. However, these percentages do offer a general overview of the sequenced cell populations and highlight potential differences between samples. In addition, these percentages, as addressed by the Reviewer, not only emphasize the shortcommings of the dissociation methods but at the same time provide some explanation for the absence of particular cell populations, as we describe in the manuscript. In our future resubmission, we will acknowledge these limitations and inform readers of any potential biases introduced by relying on these numbers.

      - The Authors decided to validate only a few clusters and in many cases there are no positive controls (such as specific localization, specific function, changes between control and regenerating animals, co-stain) that could actually validate the cluster identity and the specificity of the selected marker. There is no validation of the trajectory analysis and there is no validation of the proliferating cluster with H3P or BrdU stainings. 

      We validated the seven clusters that were important to reach our conclusions. Six of these had controls of normal (uneviscerated) intestine.  Nonetheless we will increase the number of cluster validations and include the dividing cell cluster using BrdU.

      - It is not clear what is already known about holothurian intestine regeneration and what are the new findings in this manuscript. The Authors reference several papers throughout the whole result sectioning mentioning how the steps of regeneration, the proliferating cells, some of the markers and some of the cell composition of mesenteries and anlages was already known. 

      The manuscript presents several novel findings on holothurian intestine regeneration, including:

      - The integration of multiple cellular processes, reported for the first time within a single species, along with the identification of the specific mRNAs expressed by each involved cell population.

      - A comparative analysis of the sea cucumber anlage structure, highlighting its similarities to previously described blastemal structures.

      - The identification of the potential dedifferentiated cell populations that form the foundation of the anlage, serving as the epicenter for proliferating and differentiating cells.

      We will ensure that these and other significant findings are prominently emphasized in the resubmitted manuscript.

      Reviewer #2

      - The spatial context of the RNA localization images is not well represented, making it difficult to understand how the schematic model was generated from the data. In addition, multiple strong statements in the conclusion should be better justified and connected to the data provided.

      As explained above we will make an effort to provide a better understanding of the cellular/tissue localization of the labeled cells. Similarly, we will revise the conclusions so that the statements made are well justified.

      Reviewer #3

      - Possible theoretical advances regarding lineage trajectories of cells during sea cucumber gut regeneration, but the claims that can be made with this data alone are still predictive.

      We are conscious that the results from these lineage trajectories are still predictive and will emphasize this in the text. Nonetheless, they are important part of our analyses that provide the theoretical basis for future experiments.

      - Better microscopy is needed for many figures to be convincing. Some minor additions to the figures will help readers understand the data more clearly.

      As explained above we will make an effort to provide a better

      understanding of the cellular/tissue localization of the labeled cells.  Similarly, we will revise the conclusions so that the statements made are well justified.

    1. Author response:

      We sincerely appreciate the reviewers' time, effort, and thoughtful feedback, which have significantly contributed to our research.

      A key concern raised was the potential overinterpretation of our data. While the reviewers acknowledged our identification of a possible synchronization mechanism among active mitral and tufted cells (MTCs) that is distance-independent, they correctly pointed out that we did not provide direct evidence showing how ensemble MTCs synchronize. We concur with their assessment and will address this in our forthcoming response to ensure a precise interpretation of our findings.

      Another concern raised involves the interpretation of results obtained under Ketamine anesthesia. Since Ketamine is an NMDA receptor antagonist, which plays a crucial role in MTC-GC reciprocal synapses, this might impact our conclusions. To address this, we will include analyses demonstrating that optogenetic activation of granule cells (GCs) in an anesthetized state inhibits recorded MTCs during baseline but does not affect odor-evoked MTC firing rates. Additionally, we will thoroughly discuss the potential influence of Ketamine anesthesia on GC-MTC synapses and its implications for our findings.

      Lastly, in our detailed response to the reviewers' comments, we will discuss several recent studies that are particularly relevant to our research. We will also expand on our hypothesis that parvalbumin-positive cells in the olfactory bulb may serve as key mediators of the activity- and distance-dependent lateral inhibition observed in our findings.

    1. Author response:

      We thank both reviewers for their constructive comments. We will do our best incorporating the requested analyses and answering reviewers’ questions in the revision

    1. Author response:

      General comments, factual mistakes:

      Reviewer 1 - Summary: “This study builds on the observation that the kynurenine pathway is required in the conceptus, as HOO null embryos are sensitive to maternal deficiency of NAD precursors (vitamin B3) and tryptophan, and narrows the window of sensitivity to a 3-day period.”

      Correction:

      Vitamin B3 should not be in parentheses, because vitamin B3 and tryptophan are both NAD precursors. We also suggest that the second half of this sentence is changed to “…and narrows the window of sensitivity to a 3-day period from embryonic day 7.5 to E10.5.” Currently, it reads as if Haao-null embryos are sensitive to any 3-day period of maternal NAD precursor restriction.

      Reviewer 1 – Strengths: “Abnormalities develop under conditions of maternal vitamin B3 deficiency, indicating…”

      Correction:

      We suggest replacing “vitamin B3 deficiency” with “NAD deficiency”, as this is more accurate.

      Reviewer 2 – Strengths: “…and then re-analysis of RNA-seq datasets suggested the endoderm was the cell source of NAD synthesis.”

      Correction:

      We suggest re-phrasing this sentence to “…and then re-analysis of RNA-seq datasets suggested the yolk sac endoderm cells are the source of NAD de novo synthesis.”

      Reviewer 1 (Public Review):

      However, without analysis of embryos at later stages in this experiment it is not known how long is needed for NAD synthesis to be recovered - and therefore until when the period of exposure to insufficient NAD lasts. This information would inform the understanding of the developmental origin of the observed defects.

      We are currently seeking funds to investigate the developmental origin of the observed defects. This study includes assessing how the timing of maternal NAD precursor restriction corresponds to the timing of NAD deficiency in the embryo.

      More importantly, there is still a question of whether in addition to the yolk sac, there is HAAO activity within the embryo itself prior to E12.5 (when it has first been assayed in the liver - Figure 1C).

      We have additional data showing that at E11.5 the embryo has no HAAO activity. We also tested E14.5 embryos with their livers removed, and these also do not have HAAO activity. We are planning to include these data sets in the revised version of this manuscript.

      Reviewer 2 (Public Review):

      Page 4 and Table S4. The descriptors for malformations of organs such as the kidney and vertebrae are quite vague and uninformative. More specific details are required to convey the type and range of anomalies observed as a consequence of NAD deficiency.

      Kidney defects were classified as described in Cuny et al. 2020 PNAS (PMID:32015132). In brief, kidneys with a length (tip to tip) of ≤ 1.5 mm in length were counted as hypoplastic, because the average length of a normal kidney at E18.5 is 2.98 mm (2.75-3.375 mm). The one dysmorphic kidney we observed in our dataset had a cyst. We plan to include this information plus more details of the observed vertebral defects in the revised version of this manuscript.

      Can the authors define whether the role of the NAD pathway in a couple of tissue or organ systems is the same? By this I mean is the molecular or cellular effect of NAD deficiency is the same in the vertebrae and organs such as the kidney. What unifies the effects on these specific tissues and organs and are all tissues and organs affected? If some are not, can the authors explain why they escape the need for the NAD pathway?

      We agree that this is a very important question, but consider it beyond the scope of this manuscript. To elucidate the underlying cellular and molecular mechanisms in individual organs will require a multiomic approach because NAD is involved in hundreds of molecular and cellular processes affecting gene expression, protein levels, metabolism, etc. For details of NAD functions that have relevance to embryogenesis see Dunwoodie et al 2023 https://doi.org/10.1089/ars.2023.0349. Furthermore, organs develop at different times during embryogenesis with both distinct, but in some cases shared, molecular and cellular processes. Relating these to specific NAD functions is the challenge. We are currently seeking funds to investigate how NAD deficiency disrupts organogenesis.

      Page 5 and Figure 6C. The expectation and conclusion for whether specific genes are expressed in particular cell types in scRNA-seq datasets depend on the number of cells sequenced, the technology (methodology) used, the depth of sequencing, and also the resolution of the analysis. It is therefore essential to perform secondary validation of the analysis of scRNA-seq data. At a minimum, the authors should perform in situ hybridization or immunostaining for Tdo2, Amid, Kmo, Kanu, Haao, Qprt, and Nadsyn1 or some combination thereof at multiple time points during early mouse embryogenesis to truly understand the spatiotemporal dynamics of expression and NAD synthesis.

      We have tested antibodies against HAAO, KYNU, and QPRT in adult mouse liver samples (the main site of NAD de novo synthesis) which produced non-specific bands with western blotting. Therefore, in situ immunostaining  studies on embryonic tissues are not feasible. We will investigate the possibility of effectively localizing transcripts of NAD de novo synthesis enzymes using in situ hybridization.

      Absolute functional proof of the yolk sac endoderm as being essential and required for NAD synthesis in the context of CNDD might require conditional deletion of Haoo in the yolk sac versus embryo using appropriate Cre driver lines or in the absence of a conditional allele, could be performed by tetraploid embryo-ES cell complementation approaches. But temporal dietary intervention can also approximate the same thing by perturbing NAD synthesis Shen the yolk sac is the primary source versus when the liver becomes the primary source in the embryo.

      Reviewer 1 has a related comment. We have additional data showing that at E11.5 the embryo has no HAAO activity, like the placenta. Similarly, E14.5 embryos with their livers removed, do not have HAAO activity either. We believe this provides sufficient proof that the yolk sac endoderm is the only site of NAD de novo activity in the conceptus until the liver has formed and takes over this function.

    1. Author response:

      We are grateful to the reviewers for recognizing the importance of our work and for their helpful suggestions. We will revise our manuscript in the revised version. However, we’d like to provide provisional responses now to answer the key questions and comments from the reviewers.

      (1) Both reviewers asked why we chose 24-120 hpf to measure the apoptotic rates. We chose this time window based on the following two reasons: 1) Previous studies showed that although the motor neuron death time windows vary in chick (E5-E10), mouse (E11.5-E15.5), rat (E15-E18) and human (11-25 weeks of gestation), the common feature of these time windows is that they are all the developmental periods when motor neurons contact with muscle cells. The contact between zebrafish motor neurons and muscle cells occurs before 72 hpf, which is included in our observation time window. 2) Zebrafish complete hatching during 48-72 hpf, and most organs form before 72 hpf. More importantly, zebrafish start swimming around 72 hpf, indicating that motor neurons are fully functional.

      Thus, we are confident that this 24-120 hpf time window covers the time window during which motor neurons undergo programmed cell death during zebrafish early development. We frequently used “early development” in this manuscript to describe our observation. However, we missed “early” in our title. We will add “early” in the title in the revised version.

      (2) Both reviewers also asked about the neurogenesis of motor neurons. Previous studies have shown that the production of spinal cord motor neurons largely ceases before 48 hpf and then the motor neurons remain largely constant until adulthood. Our observation time window covers the major motor neuron production process. Therefore, we believe that neurogenesis will not affect our data and conclusions.

      (3) Both reviewers questioned the specificity of using the mnx1 promoter to label motor neurons. The mnx1 promoter has been widely used to label motor neurons in transgenic zebrafish. Previous studies have shown that most of the cells labeled in the mnx1 transgenic zebrafish are motor neurons. In this study, we observed that the neuronal cells in our sensor zebrafish formed green cell bodies inside of the spinal cord and extended to the muscle region, which is an important morphological feature of the motor neurons. Furthermore, a few of those green cell bodies turned into blue apoptotic bodies inside the spinal cord and changed to blue axons in the muscle regions at the same time, which strongly suggests that those apoptotic neurons are not interneurons. Although the mnx1 promoter might have labeled some interneurons, this will not affect our major finding that only a small portion of motor neurons died during zebrafish early development.

      (4) Reviewer 2 is concerned that the estimated 50% of motor neuron death was in limb-innervating motor neurons but not in body wall-innervating motor neurons. The death of motor neurons in limb-innervating motor neurons has been extensively studied in chicks and rodents, as it is easy to undergo operations such as amputation. However, previous studies have shown this dramatic motor neuron death does not only occur in limb-innervating motor neurons but also occurs in other spinal cord motor neurons. In our manuscript, we studied the naturally occurring motor neuron death in the whole spinal cord during the early stage of zebrafish development.

      (5) Reviewer 2 mentioned that we ignored the death of an identified motor neuron. Our study was to examine the overall motor neuron apoptosis rather than a specific type of motor neuron death, so we did not emphasize the death of VaP motor neurons. We agree that the dead motor neurons observed in our manuscript contain VaP motor neurons. However, there were also other types of dead motor neurons observed in our study. The reasons are as follows: 1) VaP primary motor neurons die before 36 hpf, but our study found motor neuron cells died after 36 hpf and even at 84 hpf. 2) The position of the VaP motor neuron is together with that of the CaP motor neuron, that is, at the caudal region of the motor neuron cluster. Although it’s rare, we did observe the death of motor neurons in the rostral region of the motor neuron cluster. 3) There is only one or zero VaP motor neuron in each hemisegment. Although our data showed that usually one motor neuron died in each hemisegment, we did observe that sometimes more than one motor neuron died in the motor neuron cluster. We will include this information in the revised manuscript.

      (6) For the morpholinos, we did not confirm the downregulation of the target genes. These morpholino-related data are a minor part of our manuscript and shall not affect our major findings. Thus, we didn’t think we missed “important” controls. We will perform experiments to confirm the efficiency of the morpholinos or remove these morpholino-related data from the revised version.

    1. Author Response:

      We would like to thank the editors and reviewers for the careful consideration of our manuscript and their many helpful comments. We would like to provide provisional author responses to address the public reviews.

      Response to Reviewer 1:

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 has a role beyond mitochondrial fission in zygotes. However, there are several possible reasons why the Drp1 KO zygotes differs from the somatic cell Drp1 KO models.  

      First, the reviewer mentions that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures  (Udagawa et al. Current Biology 2014, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. 

      These mitochondrial morphologies in Drp1-deficient oocytes/zygotes may be attributed to the unique mitochondrial architecture in these cells. Mitochondria in oocytes have the shape of a small sphere with an irregular cristae located peripherally or transversely. These structural features might be the cause of insensitivity or resistance to inner membrane fusion. In addition, in our previous study (Wakai et al., Molecular Human Reproduction 2014, Fig. 2), overexpression of mitochondrial fusion factors in oocytes resulted in mitochondrial aggregation when outer membrane fusion factor Mfn1/Mfn2 was overexpressed, while overexpression of Opa1 did not cause any morphological changes. Thus, while mitochondria in oocytes/zygotes divide actively, complete fusion, including the inner membrane, as seen in somatic cells, is unlikely to occur.

      As for mitochondrial transport, we do not entirely discard its role. Althogh mitochondrial intrinsic dynamics such as fission are of primary importance for the mitochondrial distribution and partitioning in embryos, the regulation of dynamics by the cytoskeletons may be important and thus needs further study, as the reviewer pointed out.

      Response to Reviewer 2:

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      We will indicate the time after hCG as the reviewer pointed out. The only problem is that in this experiment, there may be a slight deviation from the actual mitochondrial distribution change (Fig. S1A) due to the manipulation time for Trim-Away (since it was performed outside of the incubator). Also, no significant delay in pronuclear formation or embryonic development was observed with Drp1 depleted zygotes.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various RNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 hours of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the western blotting analysis, samples were taken into account their condition at the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      We would like to add quantitative data on mitochondrial aggregation in Drp1-depleted embryos.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      We would like to add the quantitative results of the intensity of the bands for the Western blot analysis. The number of embryos analyzed is described in Fig legends, from 20 (Fig. 4) to 30 (Fig. 2) pooled samples were used.

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      We will present to indicate quantitative results on the accumulation of ROS.

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      It has been reported that Drp1 regulates meiotic spindle through spindle assembly checkpoint (SAC) (Zhou et al., Nature Communications 2022). We would like to mention the possibility pointed out in the discussion part.

      Response to Reviewer 3:

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      - Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      We would like to add a comment regarding cristae morphology.

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      ATeam fluorescence is analyzed using a regular fluorescence microscope, not a confocal laser microscope, in order to analyze the intensity in the whole embryo (or the whole blastomere). Therefore, we are currently unable to obtain images of localized areas within the cell (e.g., around the spindle) as expected by the reviewer; as shown in the images in Figure 3-figure supplement 1C, there is a tendency to see high ATP levels at the cell periphery, but further analysis is needed for clear and definitive results.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Aggregated mitochondria are localized toward the cell center, but do not behave in such a way that they are preferentially concentrated near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca2+ response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We assume that what the reviewer have pointed out is right. However, although we were able to show the bias in Ca2+ store levels between blastomeres of Drp1 depleted embryos, we did not stain mitochondria simultaneously, so we were unable to say details such as more Ca2+ stores in blastomere that inherited more mitochondria or less Ca2+ stores in blastomere with more aggregated mitochondria

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked accumulation of mitochondria around the spindle is unique to the first cleavage and seems to be coincident with the migration of the pronuclei toward the center. Since the process of assembly of the male and female pronuclei is also an event unique to the first cleavage, abnormalities such as binucleation due to mitochondrial misplacement are thought to be a phenomenon seen only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is an interesting study investigating the mechanisms underlying membrane targeting of the NLRP3 inflammasome and reporting a key role for the palmitoylation-depalmitoylation cycle of cys130 in NRLP3. The authors identify ZDHHC3 and APT2 as the specific ZDHHC and APT/ABHD enzymes that are responsible for the s-acylation and de-acylation of NLRP3, respectively. They show that the levels of ZDHHC3 and APT2, both localized at the Golgi, control the level of palmitoylation of NLRP3. The S-acylation-mediated membrane targeting of NLRP3 cooperates with polybasic domain (PBD)-mediated PI4P-binding to target NLRP3 to the TGN under steady-state conditions and to the disassembled TGN induced by the NLRP3 activator nigericin.

      However, the study has several weaknesses in its current form as outlined below.

      (1) The novelty of the findings concerning cys130 palmitoylation in NLRP3 is unfortunately compromised by recent reports on the acylation of different cysteines in NLRP3 (PMID: 38092000), including palmitoylation of the very same cys130 in NLRP3 (Yu et al https://doi.org/10.1101/2023.11.07.566005), which was shown to be relevant for NLRP3 activation in cell and animal models. What remains novel and intriguing is the finding that NLRP3 activators induce an imbalance in the acylation-deacylation cycle by segregating NLRP3 in late Golgi/endosomes from de-acylating enzymes confined in the Golgi. The interesting hypothesis put forward by the authors is that the increased palmitoylation of cys130 would finally contribute to the activation of NLRP3. However, the authors should clarify the trafficking pathway of acylated-NLRP3. This pathway should, in principle, coincide with that of TGN46 which constitutively recycles from the TGN to the plasma membrane and is trapped in endosomes upon treatment with nigericin. 

      We think the data presented in our manuscript are consistent with the majority of S-acylated NLRP3 remaining on the Golgi via S-acylation in both untreated and nigericin treated cells. We have performed an experiment with BrefeldinA (BFA), a fungal metabolite that disassembles the Golgi without causing dissolution of early endosomes, that further supports the conclusion that NLRP3 predominantly resides on Golgi membranes pre and post activation. Treatment of cells with BFA prevents recruitment of NLRP3 to the Golgi in untreated cells and blocks the accumulation of NLRP3 on the structures seen in the perinuclear area after nigericin treatment (see new Supplementary Figure 4A-D). We do see some overlap of NLRP3 signal with TGN46 in the perinuclear area after nigericin treatment (see new Supplementary Figure 2E), however this likely represents TGN46 at the Golgi rather than endosomes given that the NLRP3 signal in this area is BFA sensitive.  As with 2-BP and GFP-NLRP3C130S, GFP-NLRP3 spots also form in BFA / nigericin co-treated cells but not with untagged NLRP3. These spots also do not show any co-localisation with EEA1, suggesting that under these conditions, endosomes don’t appear to represent a secondary site of NLRP3 recruitment in the absence of an intact Golgi. However, we cannot completely rule out that some NLRP3 may recruited to endosomes at some point during its activation.

      (2) To affect the S-acylation, the authors used 16 hrs treatment with 2-bromopalmitate (2BP). In Figure 1f, it is quite clear that NLRP3 in 2-BP treated cells completely redistributed in spots dispersed throughout the cells upon nigericin treatment. What is the Golgi like in those cells? In other words, does 2-BP alter/affect Golgi morphology? What about PI4P levels after 2-BP treatment? These are important missing pieces of data since both the localization of many proteins and the activity of one key PI4K in the Golgi (i.e. PI4KIIalpha) are regulated by palmitoylation.

      We thank the reviewer for highlighting this point and agree that it is possible the observed loss of NLRP3 from the Golgi might be due to an adverse effect of 2-BP on Golgi morphology or PI4P levels. We have tested the effect of 2-BP on the Golgi markers GM130, p230 and TGN46. 2BP has marginal effects on Golgi morphology with cis, trans and TGN markers all present at similar levels to untreated control cells (Supplementary Figure 2B-D). We also tested the effect of 2-BP on PI4P levels using mCherry-P4M, a PI4P biosensor. Surprisingly, as noted by the reviewer, despite recruitment of PI4K2A being dependent on S-acylation, PI4P was still present on the Golgi after 2-BP treatment, suggesting that a reduction in Golgi PI4P levels does not underly loss of NLRP3 from the Golgi (Supplementary Figure 2A). The pool of PI4P still present on the Golgi following 2-BP treatment is likely generated by other PI4K enzymes that localise to the Golgi independently of S-acylation, such as PI4KIIIB. We have included this data in our manuscript as part of a new Supplementary Figure 2. 

      (3) The authors argue that the spots observed with NLRP-GFP result from non-specific effects mediated by the addition of the GFP tag to the NLRP3 protein. However, puncta are visible upon nigericin treatment, as a hallmark of endosomal activation. How do the authors reconcile these data? Along the same lines, the NLRP3-C130S mutant behaves similarly to wt NLRP3 upon 2-BP treatment (Figure 1h). Are those NLRP3-C130S puncta positive for endosomal markers? Are they still positive for TGN46? Are they positive for PI4P?

      This is a fair point given the literature showing overlap of NLRP3 puncta formed in response to nigericin with endosomal markers and the similarity of the structures we see in terms of size and distribution to endosomes after 2BP + nigericin treatment. We have tested whether these puncta overlap with EEA1, TGN46 or PI4P (Supplementary Figure 2A, E-G). The vast majority of spots formed by GFP-NLRP3 co-treated with 2-BP and nigericin do not co-localise with EEA1, TGN46 or PI4P. This is consistent with these spots potentially being an artifact, although it has recently been shown that human NLRP3 unable to bind to the Golgi can still respond to nigericin (Mateo-Tórtola et al., 2023). These puncta might represent a conformational change cytosolic NLRP3 undergoes in response to stimulation, although our results suggest that this doesn’t appear to happen on endosomes.

      (4) The authors expressed the minimal NLRP3 region to identify the domain required for NLRP3 Golgi localization. These experiments were performed in control cells. It might be informative to perform the same experiments upon nigericin treatment to investigate the ability of NLRP3 to recognize activating signals. It has been reported that PI4P increases on Golgi and endosomes upon NG treatment. Hence, all the differences between the domains may be lost or preserved. In parallel, also the timing of such recruitment upon nigericin treatment (early or late event) may be informative for the dynamics of the process and of the contribution of the single protein domains.

      This is an interesting point which we thank the reviewer for highlighting. However, we think that each domain on its own is not capable of responding to nigericin as shown by the effect of mutations in helix115-125 or the PB region in the full-length NLRP3 protein. NLRP3HF, which still contains a functional PB region, isn’t capable of responding to nigericin in the same way as wild type NLRP3 (Supplementary Figure 6C-D). Similarly, mutations in the PB region of full length NLRP3 that leave helix115-125 intact show that helix115-125 is not sufficient to allow enhanced recruitment of NLRP3 to Golgi membranes after nigericin treatment (Supplementary Figure 9A). We speculate that helix115-125, the PB region and the LRR domain all need to be present to provide maximum affinity of NLRP3 for the Golgi prior to encounter with and S-acylation by ZDHHC3/7. Mutation or loss of any one of the PB region, helix115-125 or the LRR lowers NLRP3 membrane affinity, which is reflected by reduced levels of NLRP3 captured on the Golgi by S-acylation at steady state and in response to nigericin. 

      (5) As noted above for the chemical inhibitors (1) the authors should check the impact of altering the balance between acyl transferase and de-acylases on the Golgi organization and PI4P levels. What is the effect of overexpressing PATs on Golgi functions?

      We have checked the effect of APT2 overexpression on Golgi morphology and can show that it has no noticeable effect, ruling out an impact of APT on Golgi integrity as the reason for loss of NLRP3 from the Golgi in the presence of overexpressed APT2. We have included these images as Supplementary Figure 11H-J. 

      It is plausible that the effects of ZDHHC3 or ZDHHC7 on enhanced recruitment of NLRP3 to the Golgi may be via an effect on PI4P levels since, as mentioned above, both enzymes are involved in recruitment of PI4K2A to the Golgi and have previously been shown to enhance levels of PI4K2A and PI4P on the Golgi when overexpressed (Kutchukian et al., 2021). However, NLRP3 mutants with most of the charge removed from the PB region, which are presumably unable to interact with PI4P or other negatively charged lipids, are still capable of being recruited to the Golgi by excess ZDHHC3. This would suggest that the effect of overexpressed ZDHHC3 on NLRP3 is largely independent of changes in PI4P levels on the Golgi and instead driven by helix115-125 and S-acylation at Cys-130. The latter point is supported by the observation that NLRP3HF and NLRP3Cys130 are insensitive to ZDHHC3 overexpression.

      At the levels of HA-ZDHHC3 used in our experiments with NLRP3 (200ng pEF-Bos-HAZDHHC3 / c.a. 180,000 cells) we don’t see any adverse effect on Golgi morphology (Author response image 1), although it has been noted previously by others that higher levels of ZDHHC3 can have an impact on TGN46 (Ernst et al., 2018). ZDHHC3 overexpression surprisingly has no adverse effects on Golgi function and in fact enhances secretion from the Golgi (Ernst et al., 2018).  

      Author response image 1.

      Overexpression of HA-ZDHHC3 does not impact Golgi morphology. A) Representative confocal micrographs of HeLaM cells transfected with 200 ng HA-ZDHHC3 fixed and stained with antibodies to STX5 or TGN46. Scale bars = 10 µm. 

      Reviewer #2 (Public Review):

      Summary:

      This paper examines the recruitment of the inflammasome seeding pattern recognition receptor NLRP3 to the Golgi. Previously, electrostatic interactions between the polybasic region of NLRP3 and negatively charged lipids were implicated in membrane association. The current study reports that reversible S-acylation of the conserved Cys-130 residue, in conjunction with upstream hydrophobic residues plus the polybasic region, act together to promote Golgi localization of NLRP3, although additional parts of the protein are needed for full Golgi localization. Treatment with the bacterial ionophore nigericin inhibits membrane traffic and prevents Golgi-associated thioesterases from removing the acyl chain, causing NLRP3 to become immobilized at the Golgi. This mechanism is put forth as an explanation for how NLRP3 is activated in response to nigericin.

      Strengths:

      The experiments are generally well presented. It seems likely that Cys-130 does indeed play a previously unappreciated role in the membrane association of NLRP3.

      Weaknesses:

      The interpretations about the effects of nigericin are less convincing. Specific comments follow.

      (1) The experiments of Figure 4 bring into question whether Cys-130 is S-acylated. For Cys130, S-acylation was seen only upon expression of a severely truncated piece of the protein in conjunction with overexpression of ZDHHC3. How do the authors reconcile this result with the rest of the story?

      Providing direct evidence of S-acylation at Cys-130 in the full-length protein proved difficult. We attempted to detect S-acylation of this residue by mass spectrometry. However, the presence of the PB region and multiple lysines / arginines directly after Cys-130 made this approach technically challenging and we were unable to convincingly detect S-acylation at Cys-130 by M/S. However, Cys-130 is clearly important for membrane recruitment as its mutation abolishes the localisation of NLRP3 to the Golgi. It is feasible that it is the hydrophobic nature of the cysteine residue itself which supports localisation to the Golgi, rather than S-acylation of Cys-130. A similar role for cysteine residues present in SNAP-25 has been reported (Greaves et al., 2009). However, the rest of our data are consistent with Cys-130 in NLRP3 being S-acylated. We also refer to another recently published study which provides additional biochemical evidence that mutation of Cys-130 impacts the overall levels of NLRP3 S-acylation (Yu et al., 2024). 

      (2) Nigericin seems to cause fragmentation and vesiculation of the Golgi. That effect complicates the interpretations. For example, the FRAP experiment of Figure 5 is problematic because the authors neglected to show that the FRAP recovery kinetics of nonacylated resident Golgi proteins are unaffected by nigericin. Similarly, the colocalization analysis in Figure 6 is less than persuasive when considering that nigericin significantly alters Golgi structure and could indirectly affect colocalization. 

      We agree that it is likely that the behaviour of other Golgi resident proteins are altered by nigericin. This is in line with a recent proteomics study showing that nigericin alters the amount of Golgi resident proteins associated with the Golgi (Hollingsworth et al., 2024) and other work demonstrating that changes in organelle pH can influence the membrane on / off rates of Rab GTPases (Maxson et al., 2023). However, Golgi levels of other peripheral membrane proteins

      that associate with the Golgi through S-acylation, such as N-Ras, appear unaltered (Author response image 2.), indicating a degree of selectivity in the proteins affected. Our main point here is that NLRP3 is amongst those proteins whose behaviour on the Golgi is sensitive to nigericin and that this change in behaviour may be important to the NLRP3 activation process, although this requires further investigation and will form the basis of future studies. 

      The reduction in co-localisation between NLRP3 and APT2, due to alterations in Golgi organisation and trafficking, was the point we were trying to make with this figure, and we apologise if this was not clear. We think that the changes in Golgi structure and function caused by nigericin potentially affect the ability of APT2 to encounter NLRP3 and de-acylate it. We have added a new paragraph to the results section to hopefully explain this more clearly. We recognise that our results supporting this hypothesis are at present limited and we have toned down the language used in the results section to reflect the nature of these findings..  

      Author response image 2.

      S-acylated peripheral membrane proteins show differential sensitivity to nigericin. A) Representative confocal micrographs of HeLaM cells coexpressing GFP-NRas and an untagged NLRP3 construct. Cells were left untreated or treated with 10 µM nigericin for 1 hour prior to fixation. Scale bars = 10 µm. B) Quantification of GFP-NRas or NLRP3 signal in the perinuclear region of cells treated with or without nigericin

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Does overnight 2-BP treatment potentially have indirect effects that could prevent NLRP3 recruitment? It would be useful here to show some sort of control confirming that the cells are not broadly perturbed.

      Please see our response to point (2) raised by reviewer #1 which is along similar lines. 

      (2) In Figure 5, "Veh" presumably is short for "Vehicle". This term should be defined in the legend.

      We have now corrected this.

      References

      Ernst, A.M., S.A. Syed, O. Zaki, F. Bottanelli, H. Zheng, M. Hacke, Z. Xi, F. Rivera-Molina, M. Graham, A.A. Rebane, P. Bjorkholm, D. Baddeley, D. Toomre, F. Pincet, and J.E. Rothman. 2018. SPalmitoylation Sorts Membrane Cargo for Anterograde Transport in the Golgi. Dev Cell. 47:479-493 e477.

      Greaves, J., G.R. Prescott, Y. Fukata, M. Fukata, C. Salaun, and L.H. Chamberlain. 2009. The hydrophobic cysteine-rich domain of SNAP25 couples with downstream residues to mediate membrane interactions and recognition by DHHC palmitoyl transferases. Mol Biol Cell. 20:1845-1854.

      Hollingsworth, L.R., P. Veeraraghavan, J.A. Paulo, J.W. Harper, and I. Rauch. 2024. Spatiotemporal proteomic profiling of cellular responses to NLRP3 agonists. bioRxiv.

      Kutchukian, C., O. Vivas, M. Casas, J.G. Jones, S.A. Tiscione, S. Simo, D.S. Ory, R.E. Dixon, and E.J. Dickson. 2021. NPC1 regulates the distribution of phosphatidylinositol 4-kinases at Golgi and lysosomal membranes. EMBO J. 40:e105990.

      Mateo-Tórtola, M., I.V. Hochheiser, J. Grga, J.S. Mueller, M. Geyer, A.N.R. Weber, and A. TapiaAbellán. 2023. Non-decameric NLRP3 forms an MTOC-independent inflammasome. bioRxiv:2023.2007.2007.548075.

      Maxson, M.E., K.K. Huynh, and S. Grinstein. 2023. Endocytosis is regulated through the pHdependent phosphorylation of Rab GTPases by Parkinson’s kinase LRRK2. bioRxiv:2023.2002.2015.528749.

      Yu, T., D. Hou, J. Zhao, X. Lu, W.K. Greentree, Q. Zhao, M. Yang, D.G. Conde, M.E. Linder, and H. Lin. 2024. NLRP3 Cys126 palmitoylation by ZDHHC7 promotes inflammasome activation. Cell Rep. 43:114070.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Strengths

      We thank the reviewer for recognizing the strengths of our in vivo Ca2+ measurements, super resolution microscopy and assessment of the secretory dysfunction in the Sjogrens syndrome mouse model.

      Weaknesses

      Point 1: The less restricted Ca2+ signal to the apical region of the acinar cell is not really relevant to the reduced activation of TMEM16a by a local signal at the apical plasma membrane.

      We agree that the spatially averaged Ca2+ signal is not indicative of the local Ca2+ signal that activates TMEM16a. The description of the disordered Ca2+ signal in the disease model was intended to simply convey that the Ca2+ signal is altered in the model. Whether or indeed how the altered spatial characteristics of the signal are deleterious is not known but we speculate in the discussion that this contributes to the ultrastructural damage observed.

      Point 2. Secretion is decreased but the amplitude of the globally averaged Ca2+ signals are increased. No proof is offered that the greater distance between IP3R and TMEM16a is the reason for decreased secretion in the face of this increased peak signal.

      We have now added new data that indicates that the local Ca2+ signal is indeed disrupted in the disease model. We show that in control animals, activation of TMEM16a by application of agonist occurs when the pipette is buffered with the slower buffer EGTA but not with the fast buffer BAPTA In contrast, in cells isolated from DMXAA -treated animals both EGTA and BAPTA abolish the agonist-induced currents (new Figure 6). These data are consistent with our super resolution data showing the distance between IP3R and TMEM16a being greaterand thus presumably is enough to allow buffering of Ca2+ release from IP3R such that it does not effectively activate TMEM16a. These data also would suggest that the increased amplitude of the spatially averaged Ca2+ signal is not sufficient to overcome this structural change.

      Point 3. Lack of evidence that the mitochondrial changes are associated with the defect in fluid secretion.

      We agree that a causal link between the decreased secretion and altered mitochondrial morphology and function is not established. Nevertheless, we feel it is reasonable to contend that profound changes in mitochondrial morphology observed at the light and EM level, together with changes in mitochondrial membrane potential and oxygen consumption are consistent with contributing to altered fluid secretion given that this is an energetically costly process. We have altered the discussion to reflect these caveats and ideas.

      Reviewer 2:

      We thank the reviewer for their assessment of our work and constructive comments.

      Reviewer 3:

      We thank the reviewer for their careful appraisal of our manuscript and insightful comments. 

      Point 1: Are all the effects of DMXAA mediated through the STING pathway?

      This is an important point because as noted DMXAA has been reported to inhibit NAD(P)H quinone oxireductase that could contribute to the phenotype reported here. In future studies we intend to test other STING pathway agonists such as MSA-2 and perhaps antagonists of the STING pathway. We have added text to the discussion indicating that all the effects observed may not be a result of activation of the STING pathway.

      Point 2: As noted, and clarified in the text, the driving force for ATP production is the electrochemical H+ gradient which establishes the mitochondrial membrane potential.

      Point 3:  The reviewer suggested there was a decrease mitochondrial membrane potential in the absence of a change in TMRE steady state.

      We apologize for the confusion generated from the presentation of the figure. We normalized TMRE fluorescence against Mitotraker green fluorescence but as shown, the figure does not reflect that the absolute TMRE fluorescence was indeed decreased. Supplemental figure 4 now shows the basal TMRE fluorescence.

      Point 4: Indications that the disruption to ER structure seen in Electron Micrographs contributes to the changes in Ca2+ signal and fluid secretion.

      We did not focus on the relative distance between ER and apical PM in the EMs primarily because the ER that projects towards the apical PM is a relatively minor component of the specialized ER expressing IP3R and is difficult to identify. We note that the disruption of the bulk ER as quantitated by altered ER-mitochondrial interfaces and fragmentation is consistent with our super resolution data and thus likely plays a role in the mechanism that results in dysregulated Ca2+ signals and reduced secretion.

      Recommendations to Authors:

      Reviewing Editor:

      (1) The Editor suggests that we should use the activity of TMEM16a to directly measure the [Ca2+] experienced by the channel.

      We now present new additional data.  First, we show an extended range of pipette [Ca2+] demonstrating identical Ca2+ sensitivity in DMXAA vs vehicle treated cells (Figure 5). Second, importantly, we now present data evaluating the ability of muscarinic stimulation to activate TMEM16a in the presence of either EGTA (slow Ca2+ buffer) or BAPTA (fast Ca2+ buffer). Notably, currents can be stimulated in control cells when the pipette is buffered with EGTA, but not in DMXAA treated cells. BAPTA inhibits activation in both situations (new Figure 6). These data are consistent with TMEM16a being activated by Ca2+ in a microdomain and that this is disrupted in the disease model.   

      (2) The Editor asks whether a decrease in IP3R3 in a subset of the samples could account for the decreased fluid secretion.

      We think this is unlikely given, as noted by the Editor, that a reduction only occurred in a subset of the samples and statistically there was no significant difference to vehicle-treated animals. Moreover, we would note that there is also no difference in the expression of IP3R2 between experimental groups and in studies of transgenic mice where either IP3R2 or IP3R3 were knocked out individually, there was no effect on salivary fluid secretion, indicating that expression of a single subtype can support stimulus-secretion coupling.

      (3) Absolute values for changes in fluorescence (over time) should be included together with SD images.

      These have been added in Figure 3.

      (4) DMXAA has additional effects to STING activation and thus other STING pathway modulators should be used.

      We agree that additional STING agonists should be explored in the future but believe that this is beyond the scope of the present studies. Additional text has been added to the discussion acknowledging the additional targets of DMXAA and that they could contribute to the phenotype.

      (5) No causal link between the observed Ca2+ changes and mitochondrial dysfunction.

      We agree that no experimental evidence is offered to directly support this contention. Nevertheless, dysregulated Ca2+ signals are well-documented to lead to altered mitochondrial structure and function and thus we feel it not unreasonable to speculate that this is a possibility.

      (6) The paper would be improved by directly assessing mechanistic connections between altered Ca2+ signaling and TMEM16a activation.

      We agree, please refer to point 1 and new figure 6.

      Reviewer 1:

      (1) Standard Deviation images should be explained and the location of ROI identified.

      We contend that Standard Deviation images provide an effective visualization (in a single image) of both the magnitude of the Ca2+ increase and the degree of recruitment of cells in the field of view during the entire period of stimulation.  We have added text to describe the utility of this technique. Nevertheless, we now show kinetic traces of the changes in fluorescence over time in both apical and basal regions in Figure 3. We also clarify that the traces shown in Figure 2 are averaged over the entire cell. 

      (2) The Authors should consider that reduced secretion is because cells are dying.

      We believe this is unlikely given the lack of morphological changes in glandular structure and the minor lymphocyte infiltration observed in this model. Nevertheless, we now add data showing that the mass of SMG is not altered in the DMXAA-treated animals compared with vehicle-treated (Figure 1E).

      (3) The role of mitochondria in the DMXAA phenotype is unclear. What is the effect of acutely de-energizing mitochondria on fluid secretion.

      Since fluid secretion is an energetically expensive undertaking, it is not unreasonable to suggest that compromised mitochondrial function may impact secretion. That being said this could occur at multiple levels- production of ATP to fuel the Na/K pump to establish membrane gradients or to provide energy to sequester Ca2+ among a multitude of targets. This will be a subject of ongoing experiments. We contend that experiments to acutely disrupt salivary mitochondria in vivo while assessing fluid secretion would be difficult experiments to perform and interpret given that local administration of agents to SMG would not effect the other major salivary glands and systemic administration would be predicted to have wide-ranging off target effects. 

      (4) Could a subset of cells with low IP3R numbers contribute to reduced fluid secretion?

      Please see the response to Reviewing Editors point 2. 

      (5) An attempt to estimate the effect of the spatial distruption of IP3R and TMEM16a localization should be made.

      Please see the response to Reviewing Editors point 1.

      Minor Points

      We have amended the statement form “Highly expressed” to increased.

      Regions of the cell have been labelled for orientation in the line scans.

      The molecular weight markers have been added in Figure 4.

      Reviewer 2:

      (1) Whether mitochondrial dysfunction is the initiator of the phenotype or a result of the dysregulated Ca2+ signal is unclear.

      We agree that our data does not clarify a classic “Chicken vs Egg” conundrum. We plan further experiments to address this issue. Future plans include repeating the mitochondrial and Ca2+ signaling experiments at earlier time points where we know fluid secretion is not yet impacted. This may potentially reveal the temporal sequence of events. Similarly, we plan experiments to mechanistically address why the global Ca2+ signal is augmented- reduced Ca2+ clearance or enhanced Ca2+ release/influx are possibilities. We speculate that reduced Ca2+ clearance, either because mitochondrial Ca2+ uptake is reduced or as a secondary consequence of reduced ATP levels on SERCA and PMCA is a likely possibility.

      (2) Measurement of ECAR and direct measurements of ATP and Seahorse methods.

      In a separate series of experiments, we monitored ECAR. These data were unfortunately very variable and difficult to interpret, although no obvious compensatory increase was observed. We plan in the future to directly monitor ATP levels in acinar cells using Mg-Green. To normalize for cell numbers in the Seahorse experiments, following centrifugation, cell pellets of equal volume were resuspended in equal volumes of buffer. Acinar cells were seeded onto Cell Tak coated dishes. This information is added to the Methods section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      (1) When introducing the different antibody clones recognizing Pan, oxidized, or reduced forms, please clearly indicate which clone number belongs to which form.  

      - We see where the original language could be confusing. Please see our new introduction to the antibodies used.

      “we evaluated the redox state of La in fusing osteoclasts using recently validated monoclonal α-La antibodies that recognize oxidized La (clone 7B6) or reduced La (clone 312B), or do not distinguish between these La species (Pan, clone 5B9)”

      (2) "Finding that the surface La pool, which promotes multinucleation in osteoclasts, is an oxidized species..." I would suggest rewording as "...is enriched in oxidized species".  

      - Agreed. We have edited the sentence as follows.

      “Finding that the surface La pool, which promotes multinucleation in osteoclasts, is enriched in an oxidized species raised the question”

      (3) Although not necessary to support the conclusions of the manuscript, it would be interesting to know if the application of La194-408 to osteoclast progenitors following NAC treatment results in the rescue of La staining at the cell surface, or if this exogenous La is acting independently from cell surface association.  

      - We agree that this is an interesting idea. We previously demonstrated that we could add La 1-375 to osteoclast progenitors following RANKL addition and promote osteoclast fusion. We also demonstrated that La 1-375 under these conditions enriched La surface staining (PMID: 36739273)

      - Therefore, we hypothesize that La 194-408 would act similarly.

      (4) Is the confirmation of La modified by the conversion of Cys 232 and 245 to alanine? What about the potential to form oligomers?  

      - To directly answer the Reviewer’s question – we simply do not know and do not have a simple way to test this. To speculate, the differential recognition of La that is reduced vs oxidized by the antibodies used here (specifically clone 312b vs clone 7b6) suggests that some conformational change is taking place when redox signaling modifies La in osteoclasts. Moreover, in Supp. Fig. 4b, we show that recombinant La 194-408 does form a small amount of dimer under our conditions while La 194-408 Cys 232 and 245 to Ala does not. These data together weakly support that La, when converted from reduced to oxidized forms or when we artificially Cys 232 and 245 to Ala, undergoes some conformational and oligomeric change. However, we are not comfortable making

      such claims in the manuscript currently and prefer to investigate this with more rigor and comment in the biological significance of these potential changes in the future.

      (5) "In conclusion, in this study, we identified redox signaling as a molecular switch that redirects La protein away from the nucleus, where it protects precursor tRNAs from exonuclease digestion, and towards its osteoclast-specific function at the cell surface..." I would suggest rewording this sentence given that there is no evidence that the function of oxidized La at the cell surface is osteoclast-specific. This phenomenon could be applicable to other cell types and other biological processes.  

      - The Reviewer makes a good point here, that we very much appreciate. We hoped to communicate that this was a unique function of La that was different from the well-recognized role this protein plays in RNA metabolism, but somewhat overstated past our intention. Please see where we have modified this statement to read:

      “In conclusion, in this study, we identified redox signaling as a molecular switch that redirects La protein away from the nucleus, where it protects precursor tRNAs from exonuclease digestion, and towards its separable function at the osteoclast surface, where La regulates the multinucleation and resorptive functions of these managers of the skeleton.”

      (6) In methods, the definition of TCEP is missing a closed parenthesis sign.  

      - Thank you, corrected.

      (7) In methods under "Cells" there is a missing superscript in 1x106 cells/ml. Presumably, this is 1x10e6.   

      - Thank you, corrected.

      (8) Please provide the sequences of primers used for RT-PCR in this study.  

      - Understood. Please see where a table of all primer sequences used has been added to the Methods under the Transcript Analysis section.

      (9) In methods, "Bone resorption" should be relabeled given that the osteoclasts are plated on calciumphosphate plates and not on a bone surface.  

      - Thank you. Please see where in the Methods both the title and all references to “bone resorption” in the method description have now been changed to “mineral resorption”.

      (10) In several figures, it would be more appropriate to correct for multiple comparisons in the statistical analyses.  

      - We appreciate this concern. Please see where Fig. 2b,c; Fig. 3 b,c; Fig. 4d; Fig. 5b,d; and Fig. 6d have been reanalyzed using paired one-way ANOVAs corrected for multiple comparisons. Now all data where t-tests are used to evaluate statistical significance are only evaluating  differences between 2 values and all experiments considering 3+ values are compared using one-way ANOVAs corrected for multiple comparisons.

      (11) Figure 5: Panels D and E are flipped relative to the legend. Please also define the reagent used for ROS signal in the legend.  

      - Thank you. D and E are now corrected and we added “(Grey = CellRox Dye)” to the end of the legend for Fig. 5a.

      (12) Supplemental Figure 5c: in the control condition, why are some nuclei not staining with the reduced La antibody?  

      - Great question, direct answer – we simply do not know.  

      Longer answer, this image is in fact representative and not exclusive to the reduced La antibody (clone 312b). When we look at La staining in mature, multinucleated osteoclast nuclei at later timepoints post fusion using even pan antibodies, we find that its localization to the nuclei of syncytial osteoclasts is not uniform, but that nuclear La preferentially enriches in some mature osteoclast nuclei and seems to be excluded from others. This may suggest that – akin to myonuclei in skeletal muscle – osteoclast nuclei in a syncytium are not all equal. However, we are far, far away from being able to make any conclusions from the data we have.

      (13) Figure 7 legend: consider breaking this legend up into multiple sentences.  

      - Thank you for the suggestion. The legend for Figure 7 has been rewritten.

      Reviewer #2 (Recommendations For The Authors):  

      (1) Can the authors use the official name of La protein in NCBI GENE and PROTEIN?  

      - While some in the field refer to lupus La protein as La protein, we choose to refer to it simply as La, as is common throughout the Lupus La Protein literature. It is our opinion that continuously referring to a protein as a name + the word protein throughout the manuscript is unnecessary and alters the flow of our manuscript’s points.

      Thanks. We have included the official name of human La in NCBI GENE ((SSB small RNA binding exonuclease protection factor La, Gene ID 6741, NCBI GENE)  into the revised text.  

      (2) The references 26 and 27 are not representative. The pioneering work from Mundy, Chambers, and Almeida (PBMID 2312718, 15528306, and 24781012) should be cited.  

      - Thanks. We have added these 3 references to better acknowledge these significant contributions.

      (3) It is hard to understand Figure 2. What are the white arrows in Figure 2a pointed to? In Figure 2b, what do the columns a-LA(Red), a-La (Pan), and a-La (Ox) mean, treatment, or staining? Figure 2c, the legend "conditions where surface proteins are oxidized (TCEP) seems to be "deoxidized.  

      - We agree. We now realized this legend was rather confusing. It has been edited to read

      “(a) Representative fluorescence and DIC confocal micrographs of primary human osteoclasts following synchronized cell-cell fusion where hemifusion inhibitor was left (Inhibition), removed (Wash) or removed but the α-La antibodies indicated were simultaneously added.

      Cyan=Hoechst Arrows=Multinucleated Osteoclasts (b) Quantification of a.” • Thanks. 2c has now been corrected to “reduced” rather than the errant “oxidized”.

      (4) How do authors normalize bone resorption, % of total area?  

      - We normalized to a separate, paired well where monocytes are differentiated to precursors (MCSF), but no RANKL is added. We have added this omitted information to the methods sections for our mineral resorption assay.

      (5) Figure 5. There are two legends (b). In Figure 5c RT-qPCR, the DC-STAMP or OC-STAMP and mature osteoclast marker calcitonin receptor should be included.

      - Thank you. There were several problems with Figure legend 5 that both you and Reviewer #1 brought our attention to. We have now corrected these errors.

      - We understand the Reviewer’s interest in these markers. However, our point is that the steadystate transcript levels of two well recognized osteoclast differentiation factors and the fusion regulator La, which our manuscript focuses on, are not significantly altered by NAC treatment at these later, fusion associated timepoints. While DC-STAMP, OC-STAMP, and Calcitonin would be interesting, we believe they are outside the scope of this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      HMGCS1, 3-hydroxy-3-methylglutaryl-CoA synthase1 is predicted to be involved in Acetyl-CoA metabolic process and mevalonate-cholesterol pathway. To induce diet-induced diabetes, they fed wild-type littermates either a standard chow (Control) or a high fat-high sucrose (HFHG) diet, where the diet composition consisted of 60% fat, 20% protein, and 20% carbohydrate (H10060, Hfkbio, China). The dietary regimen was maintained for 14 weeks. Throughout this period, body weight and fasting blood glucose (FBG) levels were measured on a weekly basis. Although the authors induced diabetes with a diet also rich in fat, the cholesterol concentration or metabolism was not investigated. After the treatment, were the animals with endothelial dysfunction? How was the blood pressure of the animals?

      Thank you for your comments and kind suggestions. We have conducted a study on the impact of HFHG diet on the serum levels of total cholesterol(T-CHO) in mice over a 14-week period. Our findings indicated that the HFHG diet significantly elevated T-CHO levels in the serum of mice (Supplementary Figure 5E). Additionally, HFHG diet was associated with an increased in blood pressure (Figure 5F) and it exacerbated the progression of endothelial dysfunction in mice (Figure 5H-L).

      Strengths:

      To explore the potential role of circHMGCS1 in regulating endothelial cell function, the authors cloned exons 2-7 of HMGCS1 into lentiviral vectors for ectopic overexpression of circHMGCS1 (Figure S2). The authors could use this experiment as a concept proof and investigate the glucose concentration in the cell culture medium. Is the pLV-circ HMGCS1 transduction in HUVEC increasing the glucose release? (Line 163)

      In the manuscript, we utilized a DMEM culture medium containing 4500 mg/L glucose. Given that the HUVEC cell culture is glucose-dependent for its metabolic processes, it was challenging to precisely evaluate the relationship between pLV-circHMGCS1 transduction and the glucose concentration in the medium.

      Weaknesses:

      (1) Pg 20. The cells were transfected with miR-4521 mimics, miR-inhibitor, or miR-NC and incubated for 24 hours. Subsequently, the cells were treated with PAHG for another 24 hours. Were the cells transfected with lipofectanine? The protocol or the lipofectamine kit used should be described. The lipofectamine protocol suggests using an incubation time of 72 hours. Why did the authors incubate for only 24 hours? If the authors did the mimic and inhibitor curves, these should be added to the supplementary figures. Please, describe the miRNA mimic and antagomir concentration used in cell culture.

      For detailed transfection methods of miRNA mimic and its inhibitor, please refer to “Transfection of miRNA mimic or inhibitor” (Line 587) in the revised Experimental Section. We employed the Hieff Trans®siRNA/miRNA in vitro transfection reagent (yeason, China, 40806ES03), with a transfection duration of 48h. The miR-4521 content in HUVEC post-transfection was quantified using qRT-PCR. The transfection of the miR-4521 mimic for 48h notably enhanced its expression in HUVEC (Supplementary Figure 3B), whereas the transfection of the miR-4521 inhibitor for the same duration significantly suppressed its expression (Supplementary Figure 3C). The concentration used for both miRNA mimic and inhibitor transfection was 50 nM. In the revised manuscript, we have corrected the transfection time and clarified that we did not utilize miRNA antagomirs in our experiments.

      (2) Pg 20, line 507. What was the miR-4521 agomiR used to treatment of the animals?

      miRNA agomir serves as a valuable experimental tool for elucidating miRNA function, used to simulate the overexpression of a specific miRNA. miRNA agomir is a chemically modified RNA molecule identical in sequence to the target miRNA, engineered for enhanced stability and transfection efficacy. Utilizing miRNA agomir enables the overexpression of the target miRNA, facilitating the investigation of miRNA functions and mechanism in vivo. In our study, we have employed miRNA mimic for cellular studies and miRNA agomir in vivo applications to achieve high expression of miRNA (Fu et al, 2019).

      (3) Figure 1B. The results are showing the RT-qPCR for only 5 circRNA, however, the results show 48 circRNAs were upregulated, and 18 were downregulated (Figure S1D). Why were the other cicRNAs not confirmed? The circRNAs upregulated with high expression are not necessarily with the best differential expression comparing control vs. PAHG groups. Furthermore, Figure 1A and S1D show circRNAs downregulated also with high expression. Why were these circRNAs not confirmed?

      Our study aims to the identification of potential biomarkers for endothelial dysfunction in type 2 diabetes, To the end, we focused on circRNAs that exhibited significant upregulation following PAHG treatment. In our sequencing data, the p-values for these top upregulated circRNAs were notably below the threshold of 0.001, prompting their selection for further validation. We employed qRT-PCR to ascertain the consistency of their expression levels with the RNA-sequencing findings. Among these, circHMGCS1 was identified as a promising candidate with regulatory potential in endothelial dysfunction. Additionally, circRNAs that were significantly downregulated will be the subject of our ongoing research endeavors.

      (4) Figure 1B shows the relative circRNAs expression. Were host genes expressed in the same direction?

      circRNAs are generated from specific exons or introns of their host genes, either individually or in combination, and the main function of circRNA depends on its non-coding RNA characteristics. The expression levels of circRNAs is not necessarily correlated with those of their host genes, and similarly, the function of circRNAs do not inherently relate to the functions of the host genes (Kristensen et al, 2019; Liu & Chen, 2022). Consequently, the data presented in Figure 1B were primarily aimed at validating the accuracy of circRNA-seq. Although we did not conduct host gene expression analysis for the identified circRNAs, our subsequent results indicated that the overexpression of circHMGCS1 did not influence the expression levels of HMGCS1 (Figure 2A).

      (5) Line 128. The circRNA RT-qPCR methodology was not described. The methodology should be described in detail in the Methods Session.

      The only difference between the circRNA RT-qPCR method and other gene detection is that random primers need to be used for reverse transcription during the reverse transcription process. Unlike linear RNAs that possess a 3' polyA tail, which allows for the use of oligo(dT) primers, circRNAs require random primers to initiate the reverse transcription process. Beyond this distinction, the other processes are no different from the common qRT-PCR process. We have revised the Isolation of RNA and miRNA for quantitative Real Time-PCR (qRT-PCR) analysis method in the revised version (Line 695).

      (6) Line 699. The relative gene expression was calculated using the 2-ΔΔCt method. This is not correct, the expression for miRNA and gene expression are represented in percentage of control.

      We initially employed the 2^-ΔΔCt method to ascertain the relative gene expression levels. Subsequently, we scaled all values by a factor of 100 to amplify the visual representation of the observed variations, thereby enhancing the visualization of the data.

      (7) Line 630. Detection of ROS for tissue and cells. The methodology for tissue was described, but not for cells.

      We have added the detailed description of the cellular ROS detection methods in the revised manuscript as follows:

      For ROS detection in cells, the treated cells were washed once by PBS, then 20 μM DHE was added, and incubated at 37°C for 30 min away from light, then washed three times by PBS and then colorless DMEM medium was added, followed by fluorescence microscopy for observation (Line 640-643).

      (8) Line 796. RNA Fluorescent In Situ Hybridization (RNA-FISH). Figure 1F shows that the RNA-Fluorescence in situ hybridization (RNA-FISH) confirmed the robust expression of cytoplasmic circHMGCS1 in HUVECs (Figure 1F). However, in the methods, lines 804 and 805 described the probes targeting circMAP3K5 and miR-4521 were applied to the sections. Hybridization was performed in a humid chamber at 37C overnight. Is it correct?

      We have made a correction in the revised manuscript. The accreted description is "the probes targeting circHMGCS1 and miR-4521 were applied to the sections"(Line816).

      (9) Line 14. Fig 1-H. The authors discuss qRT-PCR demonstrated that circHMGCS1 displayed a stable half-life exceeding 24 h, whereas the linear transcript HMGCS1 mRNA had a half-life less than 8 h (Figure 1H). Several of the antibodies may contain trace amounts of RNases that could degrade target RNA and could result in loss of RNA hybridization signal or gene expression. Thus, all of the solutions should contain RNase inhibitors. The HMGCS1 mRNA expression could be degraded over the incubation time (0-24hs) leading to incorrect results. Moreover, in the methods is not mentioned if the RNAse inhibitor was used. Please, could the authors discuss and provide information?

      This experiment was performed in cell culture as described in our Experimental Methods (Line 753), where we added actinomycin D directly into the cell culture well plates, and the cells remained in a healthy state during this treatment. We did not directly extract mRNA from cells for this experiment. Additionally, all solutions utilized throughout the whole experiment were prepared using Rnase-free water, ensuring that the integrity of the mRNA.

      (10) Further experiments demonstrated that the overexpression of circHMGCS1 stimulated the expression of adhesion molecules (VCAM1, ICAM1, and ET-1) (Figures 2B and 2C), suggesting that circHMGCS1 is involved in VED. How were these genes expressed in the RNA-seq?

      In the manuscript, we only focused exclusively on circRNA and miRNA sequencing, and not perform mRNA sequencing, Consequently, we employed qRT-PCR and Western blot to assess the expression alterations of ET-1, ICAM1, and VCAM1 at gene and protein level. The findings revealed that the overexpression of circHMGCS1 significantly upregulated the expression of adhesion molecules (VCAM1, ICAM1, and ET-1).

      (11) Line 256. By contrast, the combined treatment of circHMGCS1 and miR-4521 agomir did not significantly affect the body weight and blood glucose levels. OGTT and ITT experiments demonstrated that miR-4521 agomir considerably enhanced glucose tolerance and insulin resistance in diabetic mice (Figures 5C, 5D, and Figures S5B and S5C). Why did the miR-4521 agomir treatment considerably enhance glucose tolerance and insulin resistance in diabetic mice, but not the blood glucose levels?

      Our results showed that miR-4521 agomir could effectively suppress the increase of body weight and blood glucose in mice (Figure 5A-B).

      (12) In the experiments related to pull-down, the authors performed Biotin-coupled miR-4521 or its mutant probe, which was employed for circHMGCS1 pull-down. This result only confirms the Luciferase experiments shown in Figure 4A. The experiment that the authors need to perform is pull-down using a biotin-labeled antisense oligo (ASO) targeting the circHMGCS1 backsplice junction sequence followed by pulldown with streptavidin-conjugated magnetic beads to capture the associated miRNAs and RNA binding proteins (RBPs). Also, the ASO pulldown assay can be coupled to miRNA RT-qPCR and western blotting analysis to confirm the association of miRNAs and RBPs predicted to interact with the target circRNA.

      This point is correct. As suggested, we utilized a biotin-labeled circHMGCS1 probe for pull down experiments. Because circRNA-miRNA interactions are mainly mediated by the RNA-induced silencing complex, which includes Argonaute 2 (AGO2), we examined the levels of miR-4521 and AGO2 in the capture meterial. Our results demonstrated that circHMGCS1 significantly captured miR-4521 in the cells, with a concomitant acquisition of AGO2. These findings have been integrated into the revised manuscript (Supplementary Figures 4D and 4E).

      (13) In Figure 5, the authors showed that the results suggest that miR-4521 can inhibit the occurrence of diabetes, whereas circHMGCS1 specifically dampens the function of miR-4521, weakening its protective effect against diabetes. In this context, what are the endogenous target genes for the miR-4521 that could be regulating diabetes?

      In this study, we focused on the role of miR-4521 in endothelial function. Our animal experiments involving ARG1 knockdown revealed that the reduction of ARG1 expression resulted in the inability of miR-4521 to modulate the progression of type 2 diabetes. Consequently, ARG1 is likely an endogenous target gene of miR-4521, potentially implicated in the regulation of diabetes.

      (14) In the western blot of Figure 5, the β-actin band appears to be different from the genes analyzed. Was the same membrane used for the four proteins? The Ponceau S membrane should be provided.

      As described in our experimental methodology (Western blot analysis), we have utilized PVDF membranes for our Western blot experiments. β-actin, recognized for its high expression and specificity as a housekeeping gene, yields distinct bands with minimal background noise. This property can lead to the migration β-actin from the spot wells to both sides during electrophoresis. So much so that it is not aligned with the lane shown by the target gene. And the other 3 genes can see the phenomenon of obvious lane because their expression is not as high as β-actin. We replaced β-actin with a similar background in the revised manuscript (Figure 5L).

      (15) Why did the authors use AAV9, since the AAV9 has a tropism for the liver, heart, skeletal muscle, and not to endothelial vessels?

      AAV9 has garnered significant interest as a gene delivery vector due to its extensive tissue penetration, minimal immunogenicity, and stable gene expression profile. Its application in cardiovascular disease research and therapy has been widely reported (Barbon et al, 2023; Yao et al, 2018; Zincarelli et al, 2008). Meanwhile, we employed AAV9 for gene delivery via the tail vein injection in mice, and as shown in Figure 5J and Figure 7Q, we observed GFP signals carried by AAV9 in the thoracic aorta of mice. These findings suggest that AAV9 possesses the capability to infect endothelial cells effectively.

      Reviewer #2 (Public Review):

      Summary:

      The authors observed an aggravated vascular endothelial dysfunction upon overexpressing circHMGCS1 and inhibiting miR-4521. This study discovered that circHMGCS1 promotes arginase 1 expression by sponging miR-4521, which accelerated the impairment of vascular endothelial function.

      Strengths:

      The study is systematic and establishes the regulatory role of the circHMGCS1-miR-4521 axis in diabetes-induced cardiovascular diseases.

      Weaknesses:

      (1) The authors selected the miR-4521 as the target based on their reduced expression upon circHMGCS1 overexpression. Since the miRNA level is downregulated, the downstream target gene is expected to be upregulated even in the absence of circRNA. The changes in miRNA expression opposite to the levels of target circRNA could be through Target RNA-Directed MicroRNA Degradation. In addition, miRNA can also be stabilized by circRNAs. Hence, selecting miRNA targets based on opposite expression patterns and concluding miRNA sponging by circRNA needs further evidence of direct interactions.

      Thank you for your positive comments and kind suggestions.

      As suggested by Public Reviewer #1 (12), we employed a biotin-tagged circHMGCS1 to capture miR-4521 and AGO2 in HUVECs (Supplementary Figures 4D and 4E), and Dual luciferase assays have confirmed that miR-4521 can bind to circHMGCS1 directly. Furthermore, RNA pull down and RIP assays have demonstrated the direct binding capability of circHMGCS1 for miR-4521. Collectively, these findings underscore the direct interaction between circHMGCS1 and miR-4521.

      (2) The majority of the experiments were performed with an overexpression vector which can generate a lot of linear RNAs along with circRNAs. The linear RNAs produced by the overexpression vectors can have a similar effect to the circRNA due to sequence identity.

      In our manuscript, the employed vectors incorporate reverse repeat sequences that facilitate efficient circularization of circRNAs. This design ensures robust circular shearing upon the insertion of circRNA sequences into the polyclonal sites, thereby enhancing the overexpression of circRNAs (Supplementary Figure 2). Moreover, we used lentiviral virus as a vector for circRNA overexpression, not direct plasmid transfection. As demonstrated in Figure 2A, upon overexpression of circHMGCS1, we observed a significant upregulation in circHMGCS1 levels compared to the pLV-circNC and Control groups. Notably, the expression levels of the linear HMGCS1 mRNA did not exhibit significant alterations.

      (3) There is a lack of data of circHMGCS1 silencing and its effect on target miRNA & mRNAs.

      According to your suggestion, we employed shRNA to knockdown circHMGCS1 in HUVEC, and qRT-PCR was used to assess the expression levels of miR-4521 and ARG1. The knockdown of circHMGCS1 significantly inhibit the expression of circHMGCS1 in HUVEC without obviously affecting the levels of HMGCS1 mRNA. We then selected circHMGCS1 shRNA1 for further investigation. We observed that the knockdown of circHMGCS1 resulted in an upregulation of miR-4521 and a downregulation of ARG1 expression.

      Author response image 1.

      The impact of circHMGCS1 knockdown on ARG1 and miR-4521 expression levels in HUVEC. The cells were transfected with either circHMGCS1 shRNA1 or circHMGCS1 shRNA2, and the expressions levels of circHMGCS1 and HMGCS1 (A), miR-4521 (B) and ARG1 (C and D) in HUVECs were detected by qRT-PCR and Western blot. n=3 in each group. *p < 0.05, **p < 0.01. All significant difference was determined by one-way ANOVA followed by Bonferroni multiple comparison post hoc test, error bar indicates SD.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I suggest improving the discussion based on the literature.

      (1) Line 131. .... (hsa_circ_0008621, 899 nt in length, identified as circHMGCS1 in subsequent studies because of its host gene being HMGCS1). Please, provide the reference.

      We appreciate the valuable comments. We have made changes for improvement, which is add in Line 133(Liang et al, 2021).

      (2) The authors conclude that both in vitro and in vivo data suggest that the miR-4521 or circHMGCS1 fails to regulate the effect of diabetes-induced VED in the absence of ARG1. Therefore, ARG1 may serve as a promising VED biomarker, and circHMGCS1 and miR-4521 play a key role in regulating diabetes-induced VED by ARG1. In this context, they should re-evaluate whether this is the best title. "Circular RNA HMGCS1 sponges miR-4521 to aggravate type 2 diabetes-induced vascular endothelial dysfunction"

      This manuscript initiates its exploration with circRNA as the focal point of study (Figure 1 and Figure 2), It then delves into the miRNAs associated with circRNA and elucidates their interactions (Figure 3, Figure 4 and Figure 5). Subsequently, the manuscript identifies the target genes of miRNA and validates the regulatory effects of circRNA and miR-4521 on ARG1 (Figure 6). The study culminates with the application of the ceRNA theory to confirm the significance of ARG1 in the functional interplay between circHMGCS1 and miR-4521 (Figure 7). These findings throughout the manuscript are dedicated to uncovering the pivotal roles of circHMGCS1 and miR-4521 in modulating vascular endothelial function. Notably, the interaction between circHMGCS1 and miR-4521 represents a novel discovery of our research. Therefore, we aim to emphasize the critical function of circHMGCS1 and miR-4521 in the regulation of vascular endothelial dysfunction in type 2 diabetes within the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I have a few suggestions for improving the study further.

      (1) Although the experiments suggest the role of circHMGCS1, miR-4521 in vascular endothelial function, the direct regulation or interaction of circHMGCS1-miR-4521-ARG1 is unclear. A rescue experiment that checks the effect of circHMGCS1 silencing with/without inhibition of miR-4521 on ARG1 expression must be performed to prove the circHMGCS1- miR-4521 regulatory axis.

      Thank you very much for your constructive comments.

      According to your suggestion, we utilized shRNA to effectively knockdown circHMGCS1 in HUVEC, Subsequent expression analysis via qRT-PCR was conducted to assess the levels of miR-4521 and ARG1. The knockdown of circHMGCS1 significantly reduced the expression of circHMGCS1 in HUVEC without influencing the expression of the host gene HMGCS1. Concurrently, the knockdown of circHMGCS1 resulted in an upregulation of miR-4521 (Supplementary Figure 4B) and a downregulation of ARG1 (Figure 6P and 6Q). In our manuscript, the upregulation in ARG1 expression caused by circHMGCS1 overexpression was reduced by miR-4521, and the downregulation in ARG1 expression caused by miR-4521 overexpression was also reversed by circHMGCS1. When miR-4521 was knocked down, the expression of ARG1 increased, and circHMGCS1 abrogated its regulatory effect on the expression of ARG1. Collectively, these findings indicate that the interplay between circHMGCS1 and miR-4521 significantly influences ARG1 expression.

      Author response image 2.

      The impact of circHMGCS1 knockdown on ARG1 and miR-4521 expression levels in HUVEC. The cells were transfected with either circHMGCS1 shRNA1 or circHMGCS1 shRNA2, and the expressions levels of circHMGCS1 and HMGCS1 (A), miR-4521 (B) and ARG1 (C and D) in HUVECs were detected by qRT-PCR and Western blot. n=3 in each group. *p < 0.05, **p < 0.01. All significant difference was determined by one-way ANOVA followed by Bonferroni multiple comparison post hoc test, error bar indicates SD.

      (2) It is unclear how the authors arrived at the circHMGCS1-miR-4521 pair. The pull down of circHMGCS1 followed by qPCR enrichment analysis of all target miRNAs must be performed to select the target miRNA.

      In this manuscript, we identified the expression of miRNA under PAHG treatment through miRNA sequencing, and then further screened out 4 miRNAs with potential binding sites to circHMGCS1 utilizing the miRanda database. Subsequently, we employed qRT-PCR and Western blot analysis to confirm the regulatory influence of miR-4521 on endothelial function (Figure 3). Following this, RIP, RNA pull down, dual luciferase and RNA-FISH experiments were conducted to map the interaction between circHMGCS1 and miR-4521 (Figure 4), the direct interaction between circHMGCS1 and miR-4521 was further substantiated through overexpression and knockdown studies (Figures 5-7). while the reviewer's method may offer a more direct validation, our methodology initially involved a database-driven screening of candidate miRNAs with the potential to target and bind circHMGCS1, followed by experimental validation of these interactions. Both methodologies are capable of establishing the interaction sites between circHMGCS1 and miR-4521.

      (3) Since the back splicing is not that efficient, the linear RNA from the overexpression construct may produce many linear RNAs with miRNA binding sites. The effect seen in the case of overexpression experiments needs to consider the level of linear and circular HMGCS1 produced by the vector.

      In this manuscript, the vector's multiple cloning site is flanked by inverted repeat sequences that facilitate efficient circRNA looping. This design enables the inserted sequence to form a stable loop and undergo circularization upon transcription, leading to the overexpression of circRNA (Supplementary Figure 2). For the validation of circular RNA, we employed divergent primers that straddle the circRNA splicing junction. These primers are specific for circRNA amplification and do not amplify the corresponding linear RNA, as demonstrated in Figure 2A. Upon overexpression of circHMGCS1, we observed a significant increase in circHMGCS1 levels compared to the empty vector and Control groups, while there was no significant change in the expression level of HMGCS1 mRNA.

      (4) As miR-4521 has multiple miRNA binding sites on circHMGCS1, it is not very clear which sites were mutated in circHMGCS1-MUT.

      We have made corrections to Supplementary Figure 4C. Utilizing the miRanda algorithm, we identified 10 potential binding sites for miR-4521 on circHMGCS1. Subsequently, we selected the site with the highest binding affinity for mutational analysis (miR-4521 binding positions 3-15, circHMGCS1 binding positions 260-281, binding rate 91.67%, binding ability -17.299999 kCal/Mol). We employed a dual-luciferase assay to confirm the direct interaction between circHMGCS1 and miR-4521.

      (5) Since the ceRNA network works efficiently in an equimolar concentration of the regulatory molecules, providing the copy number of circHMGCS1, miR-4521, and target mRNAs would be helpful.

      We employed qRT-PCR to ascertain the absolute quantification of mRNA copy numbers, following established methodologies (Nolan et al, 2006; Wagatsuma et al, 2005; Zhang et al, 2009). Our qRT-PCR data reveal that the circHMGCS1 mRNA copy number is 2343±529. In comparison, the ARG1 mRNA copy number stands at 88±27, while the miR-4521 copy number is significantly higher, recorded at 36277±9407.

      Author response image 3.

      The distribution of copy numbers for circHMGCS1, miR-4521 and ARG1 in HUVECs.

      (6) The yellow highlighted "cyclization-mediated sequence-F & R" does not seem to be complementary sequences. The method section may include the details of the vectors and cloning strategies for the overexpression constructs.

      The figure below illustrates the schematic representation of the complementary structure between the upstream and downstream sequences that facilitate circRNA circularization. This strategic pairing is designed to enhance the circularization efficiency of circRNA while concurrently suppressing mRNA synthesis (Liang & Wilusz, 2014). Details of this design have been integrated into the experimental method (Line539). The specific additions are as follows:

      The circHMGCS1 sequence [NM_001098272: 43292575-43297268], the splice site AG/GT and ALU elements were inserted into the pCDH-circRNA-GFP vector (upstream ALU: AAAGTGCTGAGATTACAGGCGTGAGCCACCACCCCCGGCCCACTTTTTGTAAAGGTACGTACTAATGACTTTTTTTTTATACTTCAG, downstream ALU: GTAAGAAGCAAGGAAAAGAATTAGGCTCGGCACGGTAGCTCACACCTGTAATCCCAGCA). The restriction enzyme sites selected were EcoRI and NotI.

      Author response image 4.

      (7) Since circHMGCS1 is a multi-exonic circRNA that can undergo alternative splicing and divergent primers only validate the backsplice junction, the full-length sequence of mature circHMGCS1 needs to be checked by circRNA-RCA PCR followed by Sanger sequencing.

      In compliance with your guidance, we have enriched the revised manuscript with additional data. Specifically, we have included the full-length nucleic acid electrophoresis diagram of circHMGCS1 in Supplementary Figure 1F, the Sanger sequencing results in Supplementary Figure 1G, and a comparative analysis of the circHMGCS1 sequences obtained from Sanger sequencing with those referenced in the circBase database, presented in Supplementary Figure 1H.

      Reference:

      Barbon, E., C. Kawecki, S. Marmier, A. Sakkal, F. Collaud, S. Charles, G. Ronzitti, C. Casari, O.D. Christophe, C.V. Denis, P.J. Lenting, and F. Mingozzi. 2023. Development of a dual hybrid AAV vector for endothelial-targeted expression of von Willebrand factor. Gene Ther. 30: 245-254.

      Fu, Y., J. Chen, and Z. Huang. 2019. Recent progress in microRNA-based delivery systems for the treatment of human disease. ExRNA. 1: 24.

      Kristensen, L.S., M.S. Andersen, L.V.W. Stagsted, K.K. Ebbesen, T.B. Hansen, and J. Kjems. 2019. The biogenesis, biology and characterization of circular RNAs. Nat Rev Genet. 20: 675-691.

      Liang, D., and J.E. Wilusz. 2014. Short intronic repeat sequences facilitate circular RNA production. Genes Dev. 28: 2233-2247.

      Liang, J., X. Li, J. Xu, G.M. Cai, J.X. Cao, and B. Zhang. 2021. hsa_circ_0072389, hsa_circ_0072386, hsa_circ_0008621, hsa_circ_0072387, and hsa_circ_0072391 aggravate glioma via miR-338-5p/IKBIP. Aging (Albany NY). 13: 25213-25240.

      Liu, C.X., and L.L. Chen. 2022. Circular RNAs: Characterization, cellular roles, and applications. Cell. 185: 2016-2034.

      Nolan, T., R.E. Hands, and S.A. Bustin. 2006. Quantification of mRNA using real-time RT-PCR. Nat Protoc. 1: 1559-1582.

      Wagatsuma, A., H. Sadamoto, T. Kitahashi, K. Lukowiak, A. Urano, and E. Ito. 2005. Determination of the exact copy numbers of particular mRNAs in a single cell by quantitative real-time RT-PCR. J Exp Biol. 208: 2389-2398.

      Yao, C., T. Veleva, L. Scott, Jr., S. Cao, L. Li, G. Chen, P. Jeyabal, X. Pan, K.M. Alsina, I.D. Abu-Taha, S. Ghezelbash, C.L. Reynolds, Y.H. Shen, S.A. Lemaire, W. Schmitz, F.U. Müller, A. El-Armouche, N. Tony Eissa, C. Beeton, S. Nattel, X.H.T. Wehrens, D. Dobrev, and N. Li. 2018. Enhanced Cardiomyocyte NLRP3 Inflammasome Signaling Promotes Atrial Fibrillation. Circulation. 138: 2227-2242.

      Zhang, X.X., T. Zhang, M. Zhang, H.H. Fang, and S.P. Cheng. 2009. Characterization and quantification of class 1 integrons and associated gene cassettes in sewage treatment plants. Appl Microbiol Biotechnol. 82: 1169-1177.

      Zincarelli, C., S. Soltys, G. Rengo, and J.E. Rabinowitz. 2008. Analysis of AAV serotypes 1-9 mediated gene expression and tropism in mice after systemic injection. Mol Ther. 16: 1073-1080.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors continue their investigations on the key role of glycosylation to modulate the function of a therapeutic antibody. As a follow-up to their previous demonstration on how ADCC was heavily affected by the glycans at the Fc gamma receptor (FcγR)IIIa, they now dissect the contributions of the different glycans that decorate the diverse glycosylation sites. Using a well-designed mutation strategy, accompanied by exhaustive biophysical measurements, with extensive use of NMR, using both standard and newly developed methodologies, they demonstrate that there is one specific locus, N162, which is heavily involved in the stabilization of (FcγR)IIIa and that the concomitant NK function is regulated by the glycan at this site.

      Strengths:

      The methodological aspects are carried out at the maximum level.

      Weaknesses:

      The exact (or the best possible assessment) of the glycan composition at the N162 site is not defined.

      We will revise the Introduction to include previous findings from our laboratory regarding processing on YTS cells:

      “YTS cells, a key cytotoxic human NK cell line used for these studies, express FcγRIIIa with extensive glycan processing, including the N162 site with predominantly hybrid and complex-type glycoforms {Patel 2021}.”  

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to demonstrate a mechanistic link between Fcgamma receptor (IIIA) glycosylation and IgG binding affinity and signaling - resulting in antibody-dependent cellular cytotoxicity - ADCC. The work builds off prior findings from this group about the general impact of glycosylation on FcR (Fc receptor)-IgG binding.

      Strengths:

      The structural data (NMR) is highly compelling and very significant to the field. A demonstration of how IgG interacts with FcgRIIIA in a manner sensitive to glycosylation of both the IgG and the FcR fills a critical knowledge gap. The approach to demonstrate the selective impact of glycosylation at N162 is also excellent and convincing. The manuscript/study is, overall, very strong.

      Weaknesses:

      There are a number of minor weaknesses that should be addressed.

      (1) Since S164A is the only mutant in Figure 1 that seems to improve affinity, even if minimally, it would be a nice reference to highlight that residue in the structural model in panel B.

      We will revise Figure 1B to include the S164 site.

      (2) It is confusing why some of the mutants in the study are not represented in Figure 1 panel A. Those affinities and mutants should be incorporated into panel A so the reader can easily see where they all fall on the scale.

      We thank the reviewer for this comment. We will restructure the Results section to highlight that a primary outcome of the experiment referenced was to map the contribution of interface residues to antibody binding affinity. These data were not previously available, highlighting hotspots at the interface. Figure 1A and B report these results.

      We then used a subset of mutations from this experiment, as well as a subset of mutations from an additional library containing mutations proximal to the interface, to build a small library for evaluation using ADCC. The complete binding data for all variants, binding to two different IgG1 Fc glycoforms, is presented in Supplemental Table 1. 

      T167Y in particular needs to be shown, as it is one of few mutants that fall between what seems to be ADCC+ and ADCC- lines. Also, that mutant seems to have a stronger affinity compared to wt (judged by panel D), yet less ADCC than wt. This would imply that the relationship between affinity and activity is not as clean as stated, though it is clearly important. Comments about this would strengthen the overall manuscript.

      We thank the reviewer for this particular insight. We agree that the lack of a clean correlation between ADCC potency and affinity implies additional factors that could have affected these experimental results. We will add the following sentence to the discussion. 

      “Notably, the ADCC potency for those high-affinity variants does not fall cleanly on a line, indicating that other factors affect our observations, which may include organization at the cell surface, changes to glycan composition, or receptor trafficking.”

      (3) This statement feels out of place: "In summary, this result demonstrates that the sensitivity to antibody fucosylation may be eliminated through FcγRIIIa engineering while preserving antibody-binding affinity." In Figure 2, the authors do indeed show that mutations in FcgRIIIa can alter the impact of IgG core fucosylation, but implying that receptor engineering is somehow translatable or as impactful therapeutically as engineering the antibody itself deflates the real basic science/biochemical impact of understanding these interactions in molecular detail. Not everything has to be immediately translatable to be important. 

      We agree and will remove the highlighted sentence.   

      (4) The findings reported in Figure 2, panel C are exciting. Controls for the quality of digestion at each step should be shown (perhaps in supplementary data). We agree.

      We will add an example of the digestions as Figure S2.  

      (5) Figure 3 is confusing (mislabeled?) and does not show what is described in the Results. First, there is a F158V variant in the graph but a V158F variant in the text.

      Please correct this. 

      Thank you for identifying this typo. We will correct Figure 3.

      Second, this variant (V158F/F158V) does not show the 2-fold increase in ADCC with kifunesine as stated. 

      Thank you for drawing our attention to this rounding error. We will revise the text to report a statistically significant 1.4-fold increase.

      Finally, there are no statistical evaluations between the groups (+/- kif; +/- fucose). 

      We provide the p values for +/-fuc and +/- Kifunensine for each YTS cell line in the figure. We did not provide a global comparison of p values that included all cell lines due to some cell lines experiencing a significant change and others not. However, we will add the raw data as Supplemental Table 2 should readers wish to perform these analyses.

      The differences stated are not clearly statistically significant given the wide spread of the data. This is true even for the wt variant.

      We agree that there are points that overlap in this figure between the different treatments. However, our use of the students T-test (two tailed) using three experiments collected on three different days (each with three technical replicates) provides enough resolution to determine the significance of difference of the means for the different treatments. This is, by our estimation, a highly rigorous manner to collect and analyze the data.  

      (6) The kifunensine impact is somewhat confusing. They report a major change in ADCC, yet similar large changes with trimming only occur once most of the glycan is nearly gone (Figure 2). Kifunensine will tend to generate high mannose and possibly a few hybrid glycans. It is difficult to understand what glycoforms are truly important outside of stating that multi-branched complex-type N-glycans decrease affinity.

      Note that Figure 2 does not evaluate the kifunensine-treated glycan, which is mostly Man8 and Man9 structures. In our previous work, these structures likewise provide increased binding affinity (see pubmed ID 30016589). We believe the most important message is that composition of the N162 glycan (removed with the S164A mutation) regulates NK cell ADCC. On cells, we are not able to modulate N162 glycan composition without affecting potentially every other N-glycan on the surface, so we do not have an ADCC experiments that is directly comparable to Figure 2. Thus, this increased ADCC resulting from kifunensine treatment is consistent with previously observed increases in binding affinity measurement.  

      (7) This is outside of the immediate scope, but I feel that the impact would be increased if differences in NK cell (and thus FcgRIIIA) glycosylation are known to occur during disease, inflammation, age, or some other factor - and then to demonstrate those specific changes impact ADCC activity via this mechanism.

      We agree completely. As mentioned in the Introduction, we know that N162 glycan composition varies substantially from donor to donor based on previous work from our lab. Curiously, little variability appeared between donors at the other four Nglycosylation sites. Thus, there is the potential that different NK cell N162 glycan compositions are coincident with different indications. This is an area we are quite interested in pursuing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigate the contributions of the long noncoding RNA snhg3 in liver metabolism and MAFLD. The authors conclude that liver-specific loss or overexpression of Snhg3 impacts hepatic lipid content and obesity through epigenetic mechanisms. More specifically, the authors invoke that the nuclear activity of Snhg3 aggravates hepatic steatosis by altering the balance of activating and repressive chromatin marks at the Pparg gene locus. This regulatory circuit is dependent on a transcriptional regulator SND1.

      Strengths:

      The authors developed a tissue-specific lncRNA knockout and KI models. This effort is certainly appreciated as few lncRNA knockouts have been generated in the context of metabolism. Furthermore, lncRNA effects can be compensated in a whole organism or show subtle effects in acute versus chronic perturbation, rendering the focus on in vivo function important and highly relevant. In addition, Snhg3 was identified through a screening strategy and as a general rule the authors the authors attempt to follow unbiased approaches to decipher the mechanisms of Snhg3.

      Weaknesses:

      Despite efforts at generating a liver-specific knockout, the phenotypic characterization is not focused on the key readouts. Notably missing are rigorous lipid flux studies and targeted gene expression/protein measurement that would underpin why the loss of Snhg3 protects from lipid accumulation. Along those lines, claims linking the Snhg3 to MAFLD would be better supported with careful interrogation of markers of fibrosis and advanced liver disease. In other areas, significance is limited since the presented data is either not clear or rigorous enough. Finally, there is an important conceptual limitation to the work since PPARG is not established to play a major role in the liver.

      We thank the reviewer for the detailed comment. In this study, hepatocyte-specific Snhg3 deficiency decreased body and liver weight and alleviated hepatic steatosis in DIO mice, whereas overexpression induced the opposite effect (Figure 2 and 3). Furthermore, we investigated the hepatic differentially expressed genes (DEGs) between the DIO Snhg3-HKI and control WT mice using RNA-Seq and revealed that Snhg3 exerts a global effect on the expression of genes involved in fatty acid metabolism using GSEA (Figure 4B). We validated the expression of some DEGs involved in fatty acid metabolism by RT-qPCR. The results showed that the hepatic expression levels of some genes involved in fatty acid metabolism, including Cd36, Cidea/c and Scd1/2 were upregulated in Snhg3-HKO mice and were downregulated in Snhg3-HKI mice compared to the controls (Figure 4C), respectively. Please check them in the first paragraph in p8.

      As a transcription regulator of Cd36 and Cidea/c, it is well known that PPARγ plays major adipogenic and lipogenic roles in adipose tissue. Although the expression of PPARγ in the liver is very low under healthy conditions, induced expression of PPARγ in both hepatocytes and non-parenchymal cells (Kupffer cells, immune cells, and HSCs) in the liver has a crucial role in the pathophysiology of MASLD (Lee et al., 2023b, Chen et al., 2023, Gross et al., 2017). The activation of PPARγ in the liver induces the adipogenic program to store fatty acids in lipid droplets as observed in adipocytes (Lee et al., 2018). Moreover, the inactivation of liver PPARγ abolished rosiglitazone-induced an increase in hepatic TG and improved hepatic steatosis in lipoatrophic AZIP mice (Gavrilova et al., 2003). Furthermore, there is a strong correlation between the onset of hepatic steatosis and hepatocyte-specific PPARγ expression. Clinical trials have also indicated that increased insulin resistance and hepatic PPARγ expressions were associated with NASH scores in some obese patients (Lee et al., 2023a, Mukherjee et al., 2022). Even though PPARγ’s primary function is in adipose tissue, patients with MASLD have much higher hepatic expression levels of PPARγ, reflecting the fact that PPARγ plays different roles in different tissues and cell types (Mukherjee et al., 2022). As these studies mentioned above, our result also hinted at the importance of PPARγ in the pathophysiology of MASLD. Snhg3 deficiency or overexpression respectively induced the decrease or increase in hepatic PPARγ. Moreover, administration of PPARγ antagonist T0070907 mitigated the hepatic Cd36 and Cidea/c increase and improved Snhg3-induced hepatic steatosis. However,  conflicting findings suggest that the expression of hepatic PPARγ is not increased as steatosis develops in humans and in clinical studies and that PPARγ agonists administration didn’t aggravate liver steatosis (Gross et al., 2017). Thus, understanding how the hepatic PPARγ expression is regulated may provide a new avenue to prevent and treat the MASLD (Lee et al., 2018). We also discussed it in revised manuscript, please refer the first paragraph in the section of Discussion in p13.

      Hepatotoxicity accelerates the development of progressive inflammation, oxidative stress and fibrosis (Roehlen et al., 2020). Chronic liver injury including MASLD can progress to liver fibrosis with the formation of a fibrous scar. Injured hepatocytes can secrete fibrogenic factors or exosomes containing miRNAs that activate HSCs, the major source of the fibrous scar in liver fibrosis (Kisseleva and Brenner, 2021). Apart from promoting lipogenesis, PPARγ has also a crucial function in improving inflammation and fibrosis (Chen et al., 2023). In this study, no hepatic fibrosis phenotype was seen in Snhg3-HKO and Snhg3-HKI mice (figures supplement 1D and 2D). Moreover, deficiency and overexpression of Snhg3 respectively decreased and increased the expression of profibrotic genes, such as collagen type I alpha 1/2 (Col1a1 and Col1a2), but had no effects on the pro-inflammatory factors, including transforming growth factor β1 (Tgfβ1), tumor necrosis factor α (Tnfα), interleukin 6 and 1β (Il6 and Il1β) (figures supplement 3A and B). Inflammation is an absolute requirement for fibrosis because factors from injured hepatocytes alone are not sufficient to directly activate HSCs and lead to fibrosis (Kisseleva and Brenner, 2021). Additionally, previous studies indicated that exposure to HFD for more 24 weeks causes less severe fibrosis (Alshawsh et al., 2022). In future, the effect of Snhg3 on hepatic fibrosis in mice need to be elucidated by prolonged high-fat feeding or by adopting methionine- and choline deficient diet (MCD) feeding. Please check them in the second paragraph in the section of Discussion in p13.

      References

      ALSHAWSH, M. A., ALSALAHI, A., ALSHEHADE, S. A., SAGHIR, S. A. M., AHMEDA, A. F., AL ZARZOUR, R. H. & MAHMOUD, A. M. 2022. A Comparison of the Gene Expression Profiles of Non-Alcoholic Fatty Liver Disease between Animal Models of a High-Fat Diet and Methionine-Choline-Deficient Diet. Molecules, 27. DIO:10.3390/molecules27030858, PMID:35164140

      CHEN, H., TAN, H., WAN, J., ZENG, Y., WANG, J., WANG, H. & LU, X. 2023. PPAR-gamma signaling in nonalcoholic fatty liver disease: Pathogenesis and therapeutic targets. Pharmacol Ther, 245, 108391. DIO:10.1016/j.pharmthera.2023.108391, PMID:36963510

      GAVRILOVA, O., HALUZIK, M., MATSUSUE, K., CUTSON, J. J., JOHNSON, L., DIETZ, K. R., NICOL, C. J., VINSON, C., GONZALEZ, F. J. & REITMAN, M. L. 2003. Liver peroxisome proliferator-activated receptor gamma contributes to hepatic steatosis, triglyceride clearance, and regulation of body fat mass. J Biol Chem, 278, 34268-76. DIO:10.1074/jbc.M300043200, PMID:12805374

      GROSS, B., PAWLAK, M., LEFEBVRE, P. & STAELS, B. 2017. PPARs in obesity-induced T2DM, dyslipidaemia and NAFLD. Nat Rev Endocrinol, 13, 36-49. DIO:10.1038/nrendo.2016.135, PMID:27636730

      KISSELEVA, T. & BRENNER, D. 2021. Molecular and cellular mechanisms of liver fibrosis and its regression. Nat Rev Gastroenterol Hepatol, 18, 151-166. DIO:10.1038/s41575-020-00372-7, PMID:33128017

      LEE, S. M., MURATALLA, J., KARIMI, S., DIAZ-RUIZ, A., FRUTOS, M. D., GUZMAN, G., RAMOS-MOLINA, B. & CORDOBA-CHACON, J. 2023a. Hepatocyte PPARgamma contributes to the progression of non-alcoholic steatohepatitis in male and female obese mice. Cell Mol Life Sci, 80, 39. DIO:10.1007/s00018-022-04629-z, PMID:36629912

      LEE, S. M., MURATALLA, J., SIERRA-CRUZ, M. & CORDOBA-CHACON, J. 2023b. Role of hepatic peroxisome proliferator-activated receptor gamma in non-alcoholic fatty liver disease. J Endocrinol, 257. DIO:10.1530/JOE-22-0155, PMID:36688873

      LEE, Y. K., PARK, J. E., LEE, M. & HARDWICK, J. P. 2018. Hepatic lipid homeostasis by peroxisome proliferator-activated receptor gamma 2. Liver Res, 2, 209-215. DIO:10.1016/j.livres.2018.12.001, PMID:31245168

      MUKHERJEE, A. G., WANJARI, U. R., GOPALAKRISHNAN, A. V., KATTURAJAN, R., KANNAMPUZHA, S., MURALI, R., NAMACHIVAYAM, A., GANESAN, R., RENU, K., DEY, A., VELLINGIRI, B. & PRINCE, S. E. 2022. Exploring the Regulatory Role of ncRNA in NAFLD: A Particular Focus on PPARs. Cells, 11. DIO:10.3390/cells11243959, PMID:36552725

      ROEHLEN, N., CROUCHET, E. & BAUMERT, T. F. 2020. Liver Fibrosis: Mechanistic Concepts and Therapeutic Perspectives. Cells, 9. DIO:10.3390/cells9040875, PMID:32260126

      Reviewer #2 (Public Review):

      Through RNA analysis, Xie et al found LncRNA Snhg3 was one of the most down-regulated Snhgs by a high-fat diet (HFD) in mouse liver. Consequently, the authors sought to examine the mechanism through which Snhg3 is involved in the progression of metabolic dysfunction-associated fatty liver diseases (MASLD) in HFD-induced obese (DIO) mice. Interestingly, liver-specific Snhg3 knockout was reduced, while Snhg3 over-expression potentiated fatty liver in mice on an HFD. Using the RNA pull-down approach, the authors identified SND1 as a potential Sngh3 interacting protein. SND1 is a component of the RNA-induced silencing complex (RISC). The authors found that Sngh3 increased SND1 ubiquitination to enhance SND1 protein stability, which then reduced the level of repressive chromatin H3K27me3 on PPARg promoter. The upregulation of PPARg, a lipogenic transcription factor, thus contributed to hepatic fat accumulation.

      The authors propose a signaling cascade that explains how LncRNA sngh3 may promote hepatic steatosis. Multiple molecular approaches have been employed to identify molecular targets of the proposed mechanism, which is a strength of the study. There are, however, several potential issues to consider before jumping to a conclusion.

      (1) First of all, it's important to ensure the robustness and rigor of each study. The manuscript was not carefully put together. The image qualities for several figures were poor, making it difficult for the readers to evaluate the results with confidence. The biological replicates and numbers of experimental repeats for cell-based assays were not described. When possible, the entire immunoblot imaging used for quantification should be presented (rather than showing n=1 representative). There were multiple mislabels in figure panels or figure legends (e.g., Figure 2I, Figure 2K, and Figure 3K). The b-actin immunoblot image was reused in Figure 4J, Figure 5G, and Figure 7B with different exposure times. These might be from the same cohort of mice. If the immunoblots were run at different times, the loading control should be included on the same blot as well.

      We thank the reviewer for the detailed comment. We have provided the clear figures in revised manuscript, please check them.

      The biological replicates and numbers of experimental repeats for cell-based assays had been updated and please check them in the manuscript.

      The entire immunoblot imaging used for quantification had been provided in the primary data. Please check them.

      The original Figure 2I, Figure 2K, Figure 3K have been revised and replaced with new Figure 2F, Figure 2H, Figure 3H, and their corresponding figure legends has also been corrected in revised manuscript.

      The protein levels of CD36, PPARγ and β-ACTIN were examined at the same time and we had revised the manuscript, please check them in revised Figure 7B and 7C.

      (2) The authors can do a better job in explaining the logic for how they came up with the potential function of each component of the signaling cascade. Snhg3 is down-regulated by HFD. However, the evidence presented indicates its involvement in promoting steatosis. In Figure 1C, one would expect PPARg expression to be up-regulated (when Sngh3 was down-regulated). If so, the physiological observation conflicts with the proposed mechanism. In addition, SND1 is known to regulate RNA/miRNA processing. How do the authors rule out this potential mechanism? How about the hosting snoRNA, Snord17? Does it involve the progression of NASLD?

      We thank the reviewer for the detailed comment. Our results showed that the expression of Snhg3 was decreased in DIO mice which led us to speculate that the downregulation of Snhg3 in DIO mice might be a stress protective reaction to high nutritional state, but the specific details need to be clarified. This is probably similar to fibroblast growth factor 21 (FGF21) and growth differentiation factor 15 (GDF15), whose endogenous expression and circulating levels are elevated in obese humans and mice despite their beneficial effects on obesity and related metabolic complications (Keipert and Ost, 2021). Although FGF21 can be induced by oxidative stress and be activated in obese mice and in NASH patients, elevated FGF21 paradoxically protects against oxidative stress and reduces hepatic steatosis (Tillman and Rolph, 2020).  We had added the content the section of Discussion, please check it in the second paragraph in p12.

      SND1 has multiple roles through associating with different types of RNA molecules, including mRNA, miRNA, circRNA, dsRNA and lncRNA. SND1 could bind negative-sense SARS-CoV-2 RNA and promoted viral RNA synthesis, and to promote viral RNA synthesis (Schmidt et al., 2023). SND1 is also involved in hypoxia by negatively regulating hypoxia‐related miRNAs (Saarikettu et al., 2023). Furthermore, a recent study revealed that lncRNA SNAI3-AS1 can competitively bind to SND1 and perturb the m6A-dependent recognition of Nrf2 mRNA 3'UTR by SND1, thereby reducing the mRNA stability of Nrf2 (Zheng et al., 2023). Huang et al. also reported that circMETTL9 can directly bind to and increase the expression of SND1 in astrocytes, leading to enhanced neuroinflammation (Huang et al., 2023). However, whether there is an independent-histone methylation role of SND1/lncRNA-Snhg3 involved in lipid metabolism in the liver needs to be further investigated. We also discussed the limitation in the manuscript and please refer the section of Discussion in the third paragraph in p17.

      Snhg3 serves as host gene for producing intronic U17 snoRNAs, the H/ACA snoRNA. A previous study found that cholesterol trafficking phenotype was not due to reduced Snhg3 expression, but rather to haploinsufficiency of U17 snoRNA. Upregulation of hypoxia-upregulated mitochondrial movement regulator (HUMMR) in U17 snoRNA-deficient cells promoted the formation of ER-mitochondrial contacts, resulting in decreasing cholesterol esterification and facilitating cholesterol trafficking to mitochondria (Jinn et al., 2015). Additionally, disruption of U17 snoRNA caused resistance to lipid-induced cell death and general oxidative stress in cultured cells. Furthermore, knockdown of U17 snoRNA in vivo protected against hepatic steatosis and lipid-induced oxidative stress and inflammation (Sletten et al., 2021). We determined the expression of hepatic U17 snoRNA and its effect on SND1 and PPARγ. The results showed that the expression of U17 snoRNA decreased in the liver of DIO Snhg3-HKO mice and unchanged in the liver of DIO Snhg3-HKI mice, but overexpression of U17 snoRNA had no effect on the expression of SND1 and PPARγ (figure supplement 5A-C), indicating that Sngh3 induced hepatic steatosis was independent on U17 snoRNA. We also discussed it in revised manuscript, please refer the section of Discussion in p15.

      References

      HUANG, C., SUN, L., XIAO, C., YOU, W., SUN, L., WANG, S., ZHANG, Z. & LIU, S. 2023. Circular RNA METTL9 contributes to neuroinflammation following traumatic brain injury by complexing with astrocytic SND1. J Neuroinflammation, 20, 39. DIO:10.1186/s12974-023-02716-x, PMID:36803376

      JINN, S., BRANDIS, K. A., REN, A., CHACKO, A., DUDLEY-RUCKER, N., GALE, S. E., SIDHU, R., FUJIWARA, H., JIANG, H., OLSEN, B. N., SCHAFFER, J. E. & ORY, D. S. 2015. snoRNA U17 regulates cellular cholesterol trafficking. Cell Metab, 21, 855-67. DIO:10.1016/j.cmet.2015.04.010, PMID:25980348

      KEIPERT, S. & OST, M. 2021. Stress-induced FGF21 and GDF15 in obesity and obesity resistance. Trends Endocrinol Metab, 32, 904-915. DIO:10.1016/j.tem.2021.08.008, PMID:34526227

      SAARIKETTU, J., LEHMUSVAARA, S., PESU, M., JUNTTILA, I., PARTANEN, J., SIPILA, P., POUTANEN, M., YANG, J., HAIKARAINEN, T. & SILVENNOINEN, O. 2023. The RNA-binding protein Snd1/Tudor-SN regulates hypoxia-responsive gene expression. FASEB Bioadv, 5, 183-198. DIO:10.1096/fba.2022-00115, PMID:37151849

      SCHMIDT, N., GANSKIH, S., WEI, Y., GABEL, A., ZIELINSKI, S., KESHISHIAN, H., LAREAU, C. A., ZIMMERMANN, L., MAKROCZYOVA, J., PEARCE, C., KREY, K., HENNIG, T., STEGMAIER, S., MOYON, L., HORLACHER, M., WERNER, S., AYDIN, J., OLGUIN-NAVA, M., POTABATTULA, R., KIBE, A., DOLKEN, L., SMYTH, R. P., CALISKAN, N., MARSICO, A., KREMPL, C., BODEM, J., PICHLMAIR, A., CARR, S. A., CHLANDA, P., ERHARD, F. & MUNSCHAUER, M. 2023. SND1 binds SARS-CoV-2 negative-sense RNA and promotes viral RNA synthesis through NSP9. Cell, 186, 4834-4850 e23. DIO:10.1016/j.cell.2023.09.002, PMID:37794589

      SLETTEN, A. C., DAVIDSON, J. W., YAGABASAN, B., MOORES, S., SCHWAIGER-HABER, M., FUJIWARA, H., GALE, S., JIANG, X., SIDHU, R., GELMAN, S. J., ZHAO, S., PATTI, G. J., ORY, D. S. & SCHAFFER, J. E. 2021. Loss of SNORA73 reprograms cellular metabolism and protects against steatohepatitis. Nat Commun, 12, 5214. DIO:10.1038/s41467-021-25457-y, PMID:34471131

      TILLMAN, E. J. & ROLPH, T. 2020. FGF21: An Emerging Therapeutic Target for Non-Alcoholic Steatohepatitis and Related Metabolic Diseases. Front Endocrinol (Lausanne), 11, 601290. DIO:10.3389/fendo.2020.601290, PMID:33381084

      ZHENG, J., ZHANG, Q., ZHAO, Z., QIU, Y., ZHOU, Y., WU, Z., JIANG, C., WANG, X. & JIANG, X. 2023. Epigenetically silenced lncRNA SNAI3-AS1 promotes ferroptosis in glioma via perturbing the m(6)A-dependent recognition of Nrf2 mRNA mediated by SND1. J Exp Clin Cancer Res, 42, 127. DIO:10.1186/s13046-023-02684-3, PMID:37202791

      (3) The role of PPARg in fatty liver diseases might be a rodent-specific phenomenon. PPARg agonist treatment in humans may actually reduce ectopic fat deposition by increasing fat storage in adipose tissues. The relevance of the findings to human diseases should be discussed.

      We thank the reviewer for the detailed comment. As a transcription regulator of Cd36 and Cidea/c, it is well known that PPARγ plays major adipogenic and lipogenic roles in adipose tissue. Although the expression of PPARγ in the liver is very low under healthy conditions, induced expression of PPARγ in both hepatocytes and non-parenchymal cells (Kupffer cells, immune cells, and hepatic stellate cells (HSCs)) in the liver has a crucial role in the pathophysiology of MASLD (Lee et al., 2023b, Chen et al., 2023, Gross et al., 2017). The activation of PPARγ in the liver induces the adipogenic program to store fatty acids in lipid droplets as observed in adipocytes (Lee et al., 2018). Moreover, the inactivation of liver PPARγ abolished rosiglitazone-induced an increase in hepatic TG and improved hepatic steatosis in lipoatrophic AZIP mice (Gavrilova et al., 2003). Apart from promoting lipogenesis, PPARγ has also a crucial function in improving inflammation and fibrosis (Chen et al., 2023). Furthermore, there is a strong correlation between the onset of hepatic steatosis and hepatocyte-specific PPARγ expression. Clinical trials have also indicated that increased insulin resistance and hepatic PPARγ expressions were associated with NASH scores in some obese patients (Lee et al., 2023a, Mukherjee et al., 2022). Even though PPARγ’s primary function is in adipose tissue, patients with MASLD have much higher hepatic expression levels of PPARγ, reflecting the fact that PPARγ plays different roles in different tissues and cell types (Mukherjee et al., 2022). As these studies mentioned above, our result also hinted at the importance of PPARγ in the pathophysiology of MASLD. Snhg3 deficiency or overexpression respectively induced the decrease or increase in hepatic PPARγ. Moreover, administration of PPARγ antagonist T0070907 mitigated the hepatic Cd36 and Cidea/c increase and improved Snhg3-induced hepatic steatosis. However,  conflicting findings suggest that the expression of hepatic PPARγ is not increased as steatosis develops in humans and in clinical studies and that PPARγ agonists administration didn’t aggravate liver steatosis (Gross et al., 2017). Thus, understanding how the hepatic PPARγ expression is regulated may provide a new avenue to prevent and treat the MASLD (Lee et al., 2018). We also discussed it in revised manuscript, please refer the first paragraph in the section of Discussion in p13.

      References

      CHEN, H., TAN, H., WAN, J., ZENG, Y., WANG, J., WANG, H. & LU, X. 2023. PPAR-gamma signaling in nonalcoholic fatty liver disease: Pathogenesis and therapeutic targets. Pharmacol Ther, 245, 108391. DIO:10.1016/j.pharmthera.2023.108391, PMID:36963510

      GAVRILOVA, O., HALUZIK, M., MATSUSUE, K., CUTSON, J. J., JOHNSON, L., DIETZ, K. R., NICOL, C. J., VINSON, C., GONZALEZ, F. J. & REITMAN, M. L. 2003. Liver peroxisome proliferator-activated receptor gamma contributes to hepatic steatosis, triglyceride clearance, and regulation of body fat mass. J Biol Chem, 278, 34268-76. DIO:10.1074/jbc.M300043200, PMID:12805374

      GROSS, B., PAWLAK, M., LEFEBVRE, P. & STAELS, B. 2017. PPARs in obesity-induced T2DM, dyslipidaemia and NAFLD. Nat Rev Endocrinol, 13, 36-49. DIO:10.1038/nrendo.2016.135, PMID:27636730

      LEE, S. M., MURATALLA, J., KARIMI, S., DIAZ-RUIZ, A., FRUTOS, M. D., GUZMAN, G., RAMOS-MOLINA, B. & CORDOBA-CHACON, J. 2023a. Hepatocyte PPARgamma contributes to the progression of non-alcoholic steatohepatitis in male and female obese mice. Cell Mol Life Sci, 80, 39. DIO:10.1007/s00018-022-04629-z, PMID:36629912

      LEE, S. M., MURATALLA, J., SIERRA-CRUZ, M. & CORDOBA-CHACON, J. 2023b. Role of hepatic peroxisome proliferator-activated receptor gamma in non-alcoholic fatty liver disease. J Endocrinol, 257. DIO:10.1530/JOE-22-0155, PMID:36688873

      LEE, Y. K., PARK, J. E., LEE, M. & HARDWICK, J. P. 2018. Hepatic lipid homeostasis by peroxisome proliferator-activated receptor gamma 2. Liver Res, 2, 209-215. DIO:10.1016/j.livres.2018.12.001, PMID:31245168

      MUKHERJEE, A. G., WANJARI, U. R., GOPALAKRISHNAN, A. V., KATTURAJAN, R., KANNAMPUZHA, S., MURALI, R., NAMACHIVAYAM, A., GANESAN, R., RENU, K., DEY, A., VELLINGIRI, B. & PRINCE, S. E. 2022. Exploring the Regulatory Role of ncRNA in NAFLD: A Particular Focus on PPARs. Cells, 11. DIO:10.3390/cells11243959, PMID:36552725

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As a general strategy for the revision, I would advise the authors to focus on strengthening the analysis of the liver with the two most important figures being Figure 2 and Figure 3. The mechanism as it stands is problematic which reduces the impact of the animal studies despite substantial efforts from the authors. Consider removing or toning down some of the studies focused on mechanisms in the nucleus, including changing the title.

      We thank the reviewer for the detailed comment. In this study, hepatocyte-specific Snhg3 deficiency decreased body and liver weight, alleviated hepatic steatosis and promoted hepatic fatty acid metabolism in DIO mice, whereas overexpression induced the opposite effect. The hepatic differentially expressed genes (DEGs) between the DIO Snhg3-HKI and control WT mice using RNA-Seq and revealed that Snhg3 exerts a global effect on the expression of genes involved in fatty acid metabolism using GSEA (Figure 4B). RT-qPCR analysis confirmed that the hepatic expression levels of some genes involved in fatty acid metabolism, including Cd36, Cidea/c and Scd1/2, were upregulated in Snhg3-HKO mice and were downregulated in Snhg3-HKI mice compared to the controls (Figure 4C). Moreover, deficiency and overexpression of Snhg3 respectively decreased and increased the expression of profibrotic genes, such as Col1a1 and Col1a2, but had no effects on the pro-inflammatory factors, including Tgfβ1, Tnfα, Il6 and Il1β (figure supplement 3A and B). The results indicated that Snhg3 involved in hepatic steatosis through regulating fatty acid metabolism. Furthermore, PPARγ was selected to study its role in Snhg3-induced hepatic steatosis by integrated analyzing the data from CUT&Tag-Seq, ATAC-Seq and RNA-Seq. Finally, inhibition of PPARγ with T0070907 alleviated Snhg3 induced Cd36 and Cidea/c increases and improved Snhg3-aggravated hepatic steatosis. In summary, we confirmed that SND1/H3K27me3/PPARγ is partially responsible for Sngh3-inuced hepatic steatosis. As the reviewer suggested, we replaced the title with “LncRNA-Snhg3 Aggravates Hepatic Steatosis via PPARγ Signaling”.

      (1) How is steatosis changing in the liver? Is this due to a change in fatty acid uptake, lipogenesis/synthesis, beta-oxidation, trig secretion, etc..? The analysis in Figures 2 and 3 is mostly focused on metabolic chamber studies which seem distracting, particularly in the absence of a mechanism and given a liver-specific perturbation. The authors should use a combination of targeted gene expression, protein blots, and lipid flux measurements to provide better insights here. The histology in Figure 2H suggests a very dramatic effect but does match with lipid measurements in 2I.

      We thank the reviewer for the detailed comment. The pathogenesis of MASLD has not been entirely elucidated. Multifarious factors such as genetic and epigenetic factors, nutritional factors, insulin resistance, lipotoxicity, microbiome, fibrogenesis and hormones secreted from the adipose tissue, are recognized to be involved in the development and progression of MASLD (Buzzetti et al., 2016, Lee et al., 2017, Rada et al., 2020, Sakurai et al., 2021, Friedman et al., 2018). In this study, we investigated the hepatic differentially expressed genes (DEGs) between the DIO Snhg3-HKI and control WT mice using RNA-Seq and revealed that Snhg3 exerts a global effect on the expression of genes involved in fatty acid metabolism using GSEA (Figure 4B). We validated the expression of some DEGs involved in fatty acid metabolism by RT-qPCR. The results showed that the hepatic expression levels of some genes involved in fatty acid metabolism, including Cd36, Cidea/c and Scd1/2 were upregulated in Snhg3-HKO mice and were downregulated in Snhg3-HKI mice compared to the controls (Figure 4C), respectively. Additionally, we re-analyzed the metabolic chamber data using CalR and the results showed that there were no obvious differences in heat production, total oxygen consumption, carbon dioxide production or RER between DIO Snhg3-HKO or DIO Snhg3-HKI and the corresponding control mice (figure supplement 1C and 2C). Unfortunately, we did not detect lipid flux due to limited experimental conditions. However, in summary, our results indicated that Snhg3 is involved in hepatic steatosis by regulating fatty acid metabolism. Please check them in the first paragraph in p8.

      Additionally, we determined the hepatic TC levels in other batch of DIO Snhg3-HKO and control mice and found there was no difference in hepatic TC (as below) between DIO Snhg3-HKO and control mice fed HFD 18 weeks. Perhaps the apparent difference in TC requires a prolonged high-fat diet feeding time.

      Author response image 1.

      Hepatic TC contents of in DIO Snhg3-Flox and Snhg3-HKO mice.

      References

      BUZZETTI, E., PINZANI, M. & TSOCHATZIS, E. A. 2016. The multiple-hit pathogenesis of non-alcoholic fatty liver disease (NAFLD). Metabolism, 65, 1038-48. DIO:10.1016/j.metabol.2015.12.012, PMID:26823198

      FRIEDMAN, S. L., NEUSCHWANDER-TETRI, B. A., RINELLA, M. & SANYAL, A. J. 2018. Mechanisms of NAFLD development and therapeutic strategies. Nat Med, 24, 908-922. DIO:10.1038/s41591-018-0104-9, PMID:29967350

      LEE, J., KIM, Y., FRISO, S. & CHOI, S. W. 2017. Epigenetics in non-alcoholic fatty liver disease. Mol Aspects Med, 54, 78-88. DIO:10.1016/j.mam.2016.11.008, PMID:27889327

      RADA, P., GONZALEZ-RODRIGUEZ, A., GARCIA-MONZON, C. & VALVERDE, A. M. 2020. Understanding lipotoxicity in NAFLD pathogenesis: is CD36 a key driver? Cell Death Dis, 11, 802. DIO:10.1038/s41419-020-03003-w, PMID:32978374

      SAKURAI, Y., KUBOTA, N., YAMAUCHI, T. & KADOWAKI, T. 2021. Role of Insulin Resistance in MAFLD. Int J Mol Sci, 22. DIO:10.3390/ijms22084156, PMID:33923817

      (2) Throughout the manuscript the authors make claims about liver disease models, but this is not well supported since markers of advanced liver disease are not examined. The authors should stain and show expression for fibrosis and inflammation.

      We thank the reviewer for the detailed comment. Metabolic dysfunction-associated fatty liver disease (MASLD) is characterized by excess liver fat in the absence of significant alcohol consumption. It can progress from simple steatosis to metabolic dysfunction-associated steatohepatitis (MASH) and fibrosis and eventually to chronic progressive diseases such as cirrhosis, end-stage liver failure, and hepatocellular carcinoma (Loomba et al., 2021). As the reviewer suggested, we detected the effect of Snhg3 on liver fibrosis and inflammation. The results showed no hepatic fibrosis phenotype was seen in Snhg3-HKO and Snhg3-HKI mice (figures supplement 1D and 2D). Moreover, deficiency and overexpression of Snhg3 respectively decreased and increased the expression of profibrotic genes, such as collagen type I alpha 1/2 (Col1a1 and Col1a2), but had no effects on the pro-inflammatory factors including Tgf-β, Tnf-α, Il-6 and Il-1β (figure supplement 3A and 3B). Inflammation is an absolute requirement for fibrosis because factors from injured hepatocytes alone are not sufficient to directly activate HSCs and lead to fibrosis (Kisseleva and Brenner, 2021). Additionally, previous studies indicated that exposure to HFD for more 24 weeks causes less severe fibrosis (Alshawsh et al., 2022). In future, the effect of Snhg3 on hepatic fibrosis in mice need to be elucidated by prolonged high-fat feeding or by adopting methionine- and choline deficient diet (MCD) feeding. Please check them in the second paragraph in the section of Discussion in p13.

      References

      ALSHAWSH, M. A., ALSALAHI, A., ALSHEHADE, S. A., SAGHIR, S. A. M., AHMEDA, A. F., AL ZARZOUR, R. H. & MAHMOUD, A. M. 2022. A Comparison of the Gene Expression Profiles of Non-Alcoholic Fatty Liver Disease between Animal Models of a High-Fat Diet and Methionine-Choline-Deficient Diet. Molecules, 27. DIO:10.3390/molecules27030858, PMID:35164140

      KISSELEVA, T. & BRENNER, D. 2021. Molecular and cellular mechanisms of liver fibrosis and its regression. Nat Rev Gastroenterol Hepatol, 18, 151-166. DIO:10.1038/s41575-020-00372-7, PMID:33128017

      LOOMBA, R., FRIEDMAN, S. L. & SHULMAN, G. I. 2021. Mechanisms and disease consequences of nonalcoholic fatty liver disease. Cell, 184, 2537-2564. DIO:10.1016/j.cell.2021.04.015, PMID:33989548

      (3) Publicly available datasets show that PPARG protein is not expressed in the liver (Science 2015 347(6220):1260419, PMID: 25613900). Are the authors sure this is not an effect on another PPAR isoform like alpha? ChIP and RNA-seq pathway readouts do not distinguish between different isoforms.

      We thank the reviewer for the detailed comment. As a transcription regulator of Cd36 and Cidea/c, it is well known that PPARγ plays major adipogenic and lipogenic roles in adipose tissue. Although the expression of PPARγ in the liver is very low under healthy conditions, induced expression of PPARγ in both hepatocytes and non-parenchymal cells (Kupffer cells, immune cells, and hepatic stellate cells (HSCs)) in the liver has a crucial role in the pathophysiology of MASLD (Lee et al., 2023b, Chen et al., 2023, Gross et al., 2017). The activation of PPARγ in the liver induces the adipogenic program to store fatty acids in lipid droplets as observed in adipocytes (Lee et al., 2018). Moreover, the inactivation of liver PPARγ abolished rosiglitazone-induced an increase in hepatic TG and improved hepatic steatosis in lipoatrophic AZIP mice (Gavrilova et al., 2003). Apart from promoting lipogenesis, PPARγ has also a crucial function in improving inflammation and fibrosis (Chen et al., 2023). Furthermore, there is a strong correlation between the onset of hepatic steatosis and hepatocyte-specific PPARγ expression. Clinical trials have also indicated that increased insulin resistance and hepatic PPARγ expressions were associated with NASH scores in some obese patients (Lee et al., 2023a, Mukherjee et al., 2022). Even though PPARγ’s primary function is in adipose tissue, patients with MASLD have much higher hepatic expression levels of PPARγ, reflecting the fact that PPARγ plays different roles in different tissues and cell types (Mukherjee et al., 2022). As these studies mentioned above, our result also hinted at the importance of PPARγ in the pathophysiology of MASLD. Snhg3 deficiency or overexpression respectively induced the decrease or increase in hepatic PPARγ. Moreover, administration of PPARγ antagonist T0070907 mitigated the hepatic Cd36 and Cidea/c increase and improved Snhg3-induced hepatic steatosis. However,  conflicting findings suggest that the expression of hepatic PPARγ is not increased as steatosis develops in humans and in clinical studies and that PPARγ agonists administration didn’t aggravate liver steatosis (Gross et al., 2017). Thus, understanding how the hepatic PPARγ expression is regulated may provide a new avenue to prevent and treat the MASLD (Lee et al., 2018). We also discussed it in revised manuscript, please refer the first paragraph in the section of Discussion in p13 in revised manuscript.

      PPARα, most highly expressed in the liver, transcriptionally regulates lipid catabolism by regulating the expression of genes mediating triglyceride hydrolysis, fatty acid transport, and β-oxidation. Activators of PPARα decrease plasma triglycerides by inhibiting its synthesis and accelerating its hydrolysis (Chen et al., 2023). Mice with deletion of the Pparα gene exhibited more hepatic steatosis under HFD induction. As the reviewer suggested, we investigated the effect of Snhg3 on Pparα expression.  The result showed that both deficiency of Snhg3 or overexpression of Snhg3 doesn’t affect the mRNA level of Pparα as showing below, indicating that Snhg3-induced lipid accumulation independent on PPARα. Additionally, the exon, upstream 2k, 5’-UTR and intron regions of Pparγ, not Pparα, were enriched with the H3K27me3 mark (fold_enrichment = 4.15697) in the liver of DIO Snhg3-HKO mice using the CUT&Tag assay (table supplement 8), which was further confirmed by ChIP (Figure 6F and G). Therefore, we choose PPARγ to study its role in Sngh3-induced hepatic steatosis by integrated analyzing the data from CUT&Tag-Seq, ATAC-Seq and RNA-Seq.

      Author response image 2.

      The mRNA levels of hepatic Pparα expression in DIO Snhg3-HKO mice and Snhg3-HKI mice compared to the controls.

      References

      CHEN, H., TAN, H., WAN, J., ZENG, Y., WANG, J., WANG, H. & LU, X. 2023. PPAR-gamma signaling in nonalcoholic fatty liver disease: Pathogenesis and therapeutic targets. Pharmacol Ther, 245, 108391. DIO:10.1016/j.pharmthera.2023.108391, PMID:36963510

      GAVRILOVA, O., HALUZIK, M., MATSUSUE, K., CUTSON, J. J., JOHNSON, L., DIETZ, K. R., NICOL, C. J., VINSON, C., GONZALEZ, F. J. & REITMAN, M. L. 2003. Liver peroxisome proliferator-activated receptor gamma contributes to hepatic steatosis, triglyceride clearance, and regulation of body fat mass. J Biol Chem, 278, 34268-76. DIO:10.1074/jbc.M300043200, PMID:12805374

      GROSS, B., PAWLAK, M., LEFEBVRE, P. & STAELS, B. 2017. PPARs in obesity-induced T2DM, dyslipidaemia and NAFLD. Nat Rev Endocrinol, 13, 36-49. DIO:10.1038/nrendo.2016.135, PMID:27636730

      LEE, S. M., MURATALLA, J., KARIMI, S., DIAZ-RUIZ, A., FRUTOS, M. D., GUZMAN, G., RAMOS-MOLINA, B. & CORDOBA-CHACON, J. 2023a. Hepatocyte PPARgamma contributes to the progression of non-alcoholic steatohepatitis in male and female obese mice. Cell Mol Life Sci, 80, 39. DIO:10.1007/s00018-022-04629-z, PMID:36629912

      LEE, S. M., MURATALLA, J., SIERRA-CRUZ, M. & CORDOBA-CHACON, J. 2023b. Role of hepatic peroxisome proliferator-activated receptor gamma in non-alcoholic fatty liver disease. J Endocrinol, 257. DIO:10.1530/JOE-22-0155, PMID:36688873

      LEE, Y. K., PARK, J. E., LEE, M. & HARDWICK, J. P. 2018. Hepatic lipid homeostasis by peroxisome proliferator-activated receptor gamma 2. Liver Res, 2, 209-215. DIO:10.1016/j.livres.2018.12.001, PMID:31245168

      MUKHERJEE, A. G., WANJARI, U. R., GOPALAKRISHNAN, A. V., KATTURAJAN, R., KANNAMPUZHA, S., MURALI, R., NAMACHIVAYAM, A., GANESAN, R., RENU, K., DEY, A., VELLINGIRI, B. & PRINCE, S. E. 2022. Exploring the Regulatory Role of ncRNA in NAFLD: A Particular Focus on PPARs. Cells, 11. DIO:10.3390/cells11243959, PMID:36552725

      (4) Previous work suggests that SNHG3 regulates its neighboring gene MED18 which is an important regulator of global transcription. Could some of the observed effects be due to changes in MED18 or other neighboring genes?

      We thank the reviewer for the detailed comment. Previous work suggested that human SNHG3 promotes progression of gastric cancer by regulating neighboring MED18 gene methylation (Xuan and Wang, 2019). Here, we studied the effect of mouse Snhg3 on Med18 and the result showed that Snhg3 had no effect on the mRNA levels of Med18 (as below). Additionally, we also tested the effect of mouse Snhg3 on its neighboring gene, regulator of chromosome condensation 1 (Rcc1). Although deficiency of Snhg3 inhibited the mRNA level of Rcc1, overexpression of Snhg3 doesn’t affect the mRNA level of Rcc1 as showing below. RCC1, the only known guanine nucleotide exchange factor in the nucleus for Ran, a nuclear Ras-like G protein, directly participates in cellular processes such as nuclear envelope formation, nucleocytoplasmic transport, and spindle formation (Ren et al., 2020). RCC1 also regulates chromatin condensation in the late S and early M phases of the cell cycle. Many studies have found that RCC1 plays an important role in tumors. Furthermore, whether Rcc1 mediates the alleviated effect on MASLD of Snhg3 needs to be further investigated.

      Author response image 3.

      The mRNA levels of hepatic Rcc1 and Med18 expression in DIO Snhg3-HKO mice and Snhg3-HKI mice compared to the controls.

      References

      REN, X., JIANG, K. & ZHANG, F. 2020. The Multifaceted Roles of RCC1 in Tumorigenesis. Front Mol Biosci, 7, 225. DIO:10.3389/fmolb.2020.00225, PMID:33102517

      XUAN, Y. & WANG, Y. 2019. Long non-coding RNA SNHG3 promotes progression of gastric cancer by regulating neighboring MED18 gene methylation. Cell Death Dis, 10, 694. DIO:10.1038/s41419-019-1940-3, PMID:31534128

      (5) The claim that Snhg3 regulates SND1 protein stability seems subtle. There is data inconsistency between different panels regarding this regulation including Figure 5I, Figure 6A, and Figure 7E. In addition, is ubiquitination happening in the nucleus where Snhg3 is expressed?

      We thank the reviewer for the detailed comment. The effect of Snhg3-induced SND1 expression had been confirmed by western blotting, please check them in Figure 5I, Figure 6A, Figure 7E and corresponding primary data. Additionally, Snhg3-induced SND1 protein stability seemed subtle, indicating there may be other mechanism by which Snhg3 promotes SND1, such as riboregulation. We had added it in the section of Discussion, please check it in the second paragraph in p16.

      Additionally, we did not detect the sites where SND1 is modified by ubiquitination. Our results showed that Snhg3 was more localized in the nucleus (Figure 1D) and Snhg3 also promoted the nuclear localization of SND1 (Figure 5O). We had revised the diagram of Snhg3 action in Figure 8G. Please check them in revised manuscript.

      (6) The authors show that the loss of Snhg3 changes the global H3K27me3 level. Few enzymes modify H3K27me3 levels. Did the authors check for an interaction between EZH2, Jmjd3, UTX, and Snhg3/SND1?

      We thank the reviewer for the detailed comment. It is crucial to ascertain whether SND1 itself functions as a new demethylase or if it influences other demethylases, such as Jmjd3, enhancer of zeste homolog 2 (EZH2), and ubiquitously transcribed tetratricopeptide repeat on chromosome X (UTX). The precise mechanism by which SND1 regulates H3K27me3 is still unclear and hence requires further investigation. We had added the limitations in the section of Discussion and please check it in the third paragraph in p17.

      (7) Can the authors speculate if the findings related to Snhg3/SND1 extend to humans?

      We thank the reviewer for the detailed comment. Since the sequence of Snhg3 is not conserved between mice and humans, the findings in this manuscript may not be applicable to humans, but the detail need to be further exploited.

      (8) As a general rule the figures are too small or difficult to read with limited details in the figure legends which limits evaluation. For example, Figure 1B and almost all of 4 cannot read labels. Figure 2, cannot see the snapshots show of mice or livers. What figure is supporting the claim that snhg3KI are more 'hyper-accessible'? Can the authors clarify what Figure 4H is referring to?

      We thank the reviewer for the detailed comment. We have provided high quality figures in our revised manuscript.

      The ‘hyper-accessible’ state in the liver of Snhg3-HKI mice was inferred by the differentially accessible regions (DARs), that is, we discovered 4305 DARs were more accessible in Snhg3-HKI mice and only 2505 DARs were more accessible in control mice and please refer table supplement 3).

      The result of Figure 4H about heatmap for Cd36 was from hepatic RNA-seq of DIO Snhg3-HKI and control WT mice. For avoiding ambiguity, we have removed it.

      (9) Authors stated that upon Snhg3 knock out, more genes are upregulated(1028) than downregulated(365). This description does not match Figure 4A. It seems in Figure 4A there are equal numbers of up and downregulated genes.

      We thank the reviewer for the detailed question. We apologized for this mistake and have corrected it.

      (10) Provide a schematic of the knockout and KI strategy in the supplement.

      We thank the reviewer for the detailed comment. We had included the knockout and KI strategy in figure supplement 1A and B, and 2A.

      Reviewer #2 (Recommendations For The Authors):

      (1) Metabolic cage data need to be reanalyzed with CalR (particularly when the body weights are significantly different).

      We thank the reviewer for the detailed comment. We reanalyzed the metabolic cage data using CalR (Mina et al., 2018). The results showed that there were no obvious differences in heat production, total oxygen consumption, carbon dioxide production and the respiratory exchange ratio between DIO Snhg3-HKO and control mice. Similar to DIO Snhg3-HKO mice, there was also no differences in heat production, total oxygen consumption, carbon dioxide production, and RER between DIO Snhg3-HKI mice and WT mice. Please check them in figure supplement 1C and 2C, and Mouse Calorimetry in Materials and Methods.

      Reference

      MINA, A. I., LECLAIR, R. A., LECLAIR, K. B., COHEN, D. E., LANTIER, L. & BANKS, A. S. 2018. CalR: A Web-Based Analysis Tool for Indirect Calorimetry Experiments. Cell Metab, 28, 656-666 e1. DIO:10.1016/j.cmet.2018.06.019, PMID:30017358

      (2) ITT in Figure 2F should also be presented as % of the initial glucose level, which would reveal that there is no difference between WT and KO.

      We thank the reviewer for the detailed comment. We repeated ITT experiment and include the new data in revised manuscript, please check it in Figure 2C.

      (3) The fasting glucose results are inconsistent between ITT and GTT. Is there any difference in fasting glucose?

      We thank the reviewer for the questions. The difference between GTT and ITT was caused owing to different fasting time, that is, mice were fasted for 6 h in ITT and were fasted for 16 h in GTT. It seems that Snhg3 doesn’t affect short- and longer-time fasting glucose levels and please refer Figures 2C and 3C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses Reviewer 1: 

      There are a number of weaknesses in the study. The small sample size is a significant limitation of the study. Out of 31 patients, only 17 patients were reported to develop neuropathy, with significant neuropathy (grade 2/3) in only 5 patients. The authors acknowledge this limitation in the results and discussion sections of the manuscript, but it limits the interpretation of the results. Also acknowledged is the limited method used to assess neuropathy. 

      We agree with the reviewer that the cohort size and assessment of neuropathy are limitations of our study as we already described in the corresponding section of the manuscript. However, occurrence and grade of the neuropathy are in line with results reported from previous studies. From these studies, the expected occurrence of neuropathy with our therapeutic regimen is around 50-70% (54.9% in our cohort), and most patients (80-90%) are expected to experience Grade 1 neuropathy after 12 weeks (13). In these studies, neuropathy is assessed by using questionnaires or by grading via NCTCTCAE as in our study. In summary, assessment and occurrence of neuropathy of our reported cohort are in line with previous reports.

      Potentially due to this small number of patients with neuropathy, the machine learning algorithms could not distinguish between samples with and without neuropathy. Only selected univariate analyses identified differences in lipid profiles potentially related to neuropathy.  

      The data analysis consistently followed a "mixture of experts" approach, as this seems to be the most successful way to deal with omics data. We have elaborated on this in the Methods section, including several supporting references. Regarding the quoted sentence from the results section, after rereading it, we realized that it was somewhat awkwardly worded. What we mean is now better worded in the results section, namely “Although the three algorithms detected neuropathy in new cases, unseen during training, at balanced accuracy of up to 0.75, while only the guess level of 0.5 was achieved when using permuted data for training, the 95% CI of the performance measures was not separated from guess level”. Therefore, multivariate feature selection was not considered a valid approach, since it requires that the algorithms from which the feature importance is read can successfully perform their task of class assignment (4). Therefore, univariate methods (Cohen's d, FPR, FWE) were preferred, as well as a direct hypothesis transfer of the top hits from the abovementioned day1/2 assessments to neuropathy. Classical statistics consisting of direct group comparisons using Kruskal-Wallis tests (5) were performed.” 

      It was our approach to investigate the data set in an unbiased manner by different machine learning algorithms and select those lipids that the majority of the algorithms considered important for distinguishing the patient groups (majority voting). This way, the inconsistencies and limitations of a single evaluation method, such as regression analysis, that occur in some datasets, can be mitigated. 

      Three sphingolipid mediators including SA1P differed between patients with and without neuropathy at the end of treatment. These sphingolipids were elevated at the end of treatment in the cohort with neuropathy, relative to those without neuropathy. However, across all samples from pre to post-paclitaxel treatment, there was a significant reduction in SA1P levels. It is unclear from the data presented what the underlying mechanism for this result would be. 

      We agree with the reviewer that our study does not identify the mechanism by which paclitaxel treatment alters sphingolipid concentrations in the plasma of patients. It has been reported before that paclitaxel may increase expression and activity of serine palmitoyltransferase (SPT) which is the crucial enzyme and rate-limiting step in the denovo synthesis of sphingolipids. This may be associated with a shift towards increased synthesis of 1-deoxysphingolipids and a decrease of “classical” sphingolipids (6) and may explain the general reduction of SA1P and other sphingolipid levels after paclitaxel treatment in our study. 

      It is also conceivable that paclitaxel reduces the release of sphingolipids into the plasma. Paclitaxel is a microtubule stabilizing agent (7) that may interfere with intracellular transport processes and release of paracrine mediators. 

      The mechanistic details of paclitaxel involvement in sphingolipid metabolism or transport are highly interesting but identifying them is beyond the scope of our manuscript.

      If elevated SA1P is associated with neuropathy development, it would be expected to increase in those who develop neuropathy from pre to post-treatment time points. 

      There is a general trend of reduced plasma SA1P concentrations following paclitaxel treatment. Nevertheless, patients experiencing neuropathy exhibit significantly elevated SA1P levels post-treatment. 

      It has been shown before that paclitaxel-induced neuropathic pain requires activation of the S1P1 receptor in a preclinical study (8). Moreover, a meta-analysis of genome-wide association studies (GWAS) from two clinical cohorts identified multiple regulatory elements and increased activity of S1PR1 associated with paclitaxel-induced neuropathy (9). These data imply that enhanced S1P receptor activity and signaling are key drivers of paclitaxel-induced neuropathy. It seems that both, increased levels of the sphingolipid ligands in combination with enhanced expression and activity of S1P receptors can potentiate paclitaxel-induced neuropathy in patients. This explains why also decreased SA1P concentrations after paclitaxel treatment can still enhance neuropathy via the S1PRTRPV1 axis in sensory neurons.

      We added this paragraph to the discussions section of our manuscript.

      Primary sensory neuron cultures were used to examine the effects of SA1P application.

      SA1P application produced calcium transients in a small proportion of sensory neurons. It is not clear how this experimental model assists in validating the role of SA1P in neuropathy development as there is no assessment of sensory neuron damage or other hallmarks of peripheral neuropathy. These results demonstrate that some sensory neurons respond to SA1P and that this activity is linked to TRPV1 receptors. However, further studies will be required to determine if this is mechanistically related to neuropathy.

      As we detected elevated levels of SA1P in the plasma of PIPN patients, we can assume higher concentrations in the vicinity of sensory neurons. These neurons are the main drivers for neuropathy and neuropathic pain and are strongly affected by paclitaxel in their activity (10-15). Also, TRPV1 shows altered activity patterns in response to paclitaxel treatment (16). Because of its relevance for nociception and pathological pain, TRPV1 activity is a suitable and representative readout for pathological pain states in peripheral sensory neurons (17, 18), which is why we investigated them.

      We would like to point out the potency of SA1P to increase capsaicin-induced calciumtransients in sensory neurons at submicromolar concentrations. 

      We also agree with the reviewer that further studies need to investigate the underlying mechanisms in more detail. We added this sentence to the final paragraph in the discussion section of our manuscript.

      Weaknesses Reviewer 2: 

      The article is poorly written, hindering a clear understanding of core results. While the study's goals are apparent, the interpretation of sphingolipids, particularly SA1P, as key mediators of paclitaxel-induced neuropathy lacks robust evidence. 

      We agree that the relevance of SA1P as key mediator of paclitaxel-induced neuropathy might be overstated and changed the wording throughout the manuscript accordingly. However, we would like to point out the potency of this lipid to increase capsaicin-induced calcium-transients in sensory neurons at submicromolar concentrations. 

      Also, the lipid signature in the plasma of PIPN patients shows a unique pattern and sphingolipids are the group that showed the strongest alterations when comparing the patient groups. We also measured eicosanoids, such as prostaglandins, linoleic acid metabolites, endocannabinoids and other lipid groups that have previously been associated with influences on pain perception or nociceptor sensitization. However, none of these lipids showed significant differences in their concentrations in patient plasma. This is why we consider sphingolipids as contributors to or markers of paclitaxel-induced neuropathy in patients.

      We also revised the entire article to improve its clarity.

      The introduction fails to establish the significance of general neuropathy or peripheral neuropathy in anticancer drug-treated patients, and crucial details, such as the percentage of patients developing general neuropathy or peripheral neuropathy, are omitted. This omission is particularly relevant given that only around 50% of patients developed neuropathy in this study, primarily of mild Grade 1 severity with negligible symptoms, contradicting the study's assertion of CIPN as a significant side effect. 

      As we already described in the introduction, CIPN is a serious dose- and therapy-limiting side effect, which affects up to 80% of treated patients. This depends on dose and combination of chemotherapeutic agents. For paclitaxel, therapeutic doses range from 80 – 225 mg/m². As CIPN symptoms are dose-dependent, the number of PIPN patients that receive a high paclitaxel dose is higher than the number of PIPN patient receiving a low dose.

      In our study, we mainly used a low dose paclitaxel, because this therapeutic regimen is the most widely used paclitaxel monotherapy. From previous studies, the expected occurrence of neuropathy with this therapeutic regimen is around 50-70%, and most patients (8090%) are expected to experience Grade 1 neuropathy after 12 weeks (1-3).

      Our results are within the range reported by these studies (54.9% patients with neuropathy). Also, as we highlight in Table S1, the neuropathy symptoms persist in most cases for several years after chemotherapy, affecting quality of life of these patients which makes it far from being a negligible symptom.

      We added some more information concerning PIPN in the introduction section in which we emphasize the clinical problem.

      The lack of clarity in distinguishing results obtained by lipidomics using machine learning methods and conventional methods adds to the confusion. The poorly written results section fails to specify SA1P's downregulation or upregulation, and the process of narrowing down to sphingolipids and SA1P is inadequately explained. 

      We have tried to keep the machine learning part in the main manuscript short and moved major parts of it to a supplement. However, as this has been claimed to have led to a lack of clarity, we have expanded the description of the data analysis and added extensive explanations and supporting references for the mixed expert approach that was used throughout the analysis. We hope this is now clear.

      Integrating a significant portion of the discussion section into the results section could enhance clarity. An explanation of the utility of machine learning in classifying patient groups over conventional methods and the citation of original research articles, rather than relying on review articles, may also add clarity to the usefulness of the study. 

      As suggested by the reviewer, we moved the relevant parts from the discussion to the results section in the revised version of our manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      Figure 2 should be better explained or removed. In its current form, it does not add to the interpretation of the manuscript.  

      As mentioned above, we have expanded the description of the ESOM/U-matrix method in the Methods section and rewritten the figure legend. In addition, we have annotated the U-matrix in the figure. The method has been reported extensively in the computer science and biomedical literature, and a more detailed description in the referenced papers would go beyond the current focus on lipidomics. However, we believe that this discussion is sufficiently detailed for the readers of this report: "… a second unsupervised approach was used to verify the agreement between the lipidomics data structure and the prior classification, implemented as self-organizing maps (SOM) of artificial neurons (19). In the special form of an “emergent” SOM (ESOM (20)), the present map consisted of 4,000 neurons arranged on a two-dimensional toroidal grid with 50 rows and 80 columns (21, 22). ESOM was used because it has been repeatedly shown to correctly detect subgroup structures in biomedical data sets comparable to the present one (20, 22, 23). The core principle of SOM learning is to adjust the weights of neurons based on their proximity to input data points. In this process, the best matching unit (BMU) is identified as the neuron closest to a given data point. The adaptation of the weights is determined by a learning rate (η) and a neighborhood function (h), both of which gradually decrease during the learning process. Finally, the groups are projected onto separate regions of the map. On top of the trained ESOM, the distance structure in the high-dimensional feature space was visualized in the form of a so-called U-matrix (24) which is the canonical tool for displaying the distance structures of input data on ESOM (21). 

      The visual presentation facilitates data group separation by displaying the distances between BMUs in high-dimensional space in a color-coding that uses a geographical map analogy, where large "heights" represent large distances in feature space, while low "valleys" represent data subsets that are similar. "Mountain ranges" with "snow-covered" heights visually separate the clusters in the data. Further details about ESOM can be found in (24)."

      The second patient cohort is only included in the discussion - with cohort details in the supplementary material and figures included in the main text. Perhaps these data should be removed entirely. The findings are described as trends and not statistically significant and multiple issues with this second cohort are mentioned in the discussion. 

      We agree with the reviewer that including the second patient cohort in the discussion is inadequate. Of course, there are differences between the patient cohorts that do not allow direct comparison and that are highlighted in the section on limitations of the study. However, we still think it is interesting and relevant to show these data, because we used our algorithms trained on the first patient cohort to analyze the second cohort. And these data support the main results. 

      We therefore moved the entire paragraph to the results section of to improve coherence of our manuscript. The passage was introduced with the subheading:  “Support of the main results in an independent second patient cohort”.

      The title does not reflect the content of the paper and should be changed to better reflect the content and its significance. 

      We change the title to “Machine learning and biological validation identify sphingolipids as potential mediators of paclitaxel-induced neuropathy in cancer patients” to avoid overstating the results as suggested by the Reviewer.

      Further, the discussion should be modified to avoid overstating the results. 

      As the reviewer suggests, we changed the wording to avoid overstating the results. 

      Reviewer #2 (Recommendations For The Authors): 

      Please address the absence of clear neuropathy in the majority of patients after treatment with paclitaxel in your discussion. 

      As stated above, occurrence and grade of the neuropathy are in line with the results from previous studies. From these studies, the expected occurrence of neuropathy with our therapeutic regimen is around 50-70%, (the variability is due to differences in the assessment methods) and most patients (80-90%) are expected to experience Grade 1 neuropathy after 12 weeks (1-3). 

      We added this information in the discussion section of the revised manuscript.

      Line 65: Kindly replace review articles with original research articles for proper citation. 

      We replaced the review articles with original publications, focusing on clinical observations. We added the following publications: Jensen et al., Front Neurosci 2020; Chen et al., Neurobiol Aging 2018; Igarashi et al., J Alzheimers Dis. 2011; Kim et al., Oncotarget 2017 as references 17-20 in the revised version of our manuscript.

      Line 260: The mention of SA1P is introduced here without prior reference (do not use words like "again", or "see above", if it is not previously mentioned). Adjust the text for coherence.

      We agree with the reviewer that the introduction of SA1P in this passage in incoherent. We replaced the sentence in line 260 with: 

      The small set of lipid mediators emerging from all three methods as informative for neuropathy included the sphingolipid sphinganine-1-phosphate (SA1P), also known as dihydrosphingosine-1-phosphate (DH-S1P)…”

      Lines 301-315: Consider relocating several lines from this section to the results section for improved clarity. 

      We moved the lines 309-312 explaining the algorithm selection and their validation success in the corresponding results section (Lipid mediators informative for assigning postpaclitaxel therapy samples to neuropathy).

      Lines 382-396: Move this content to the results section to enhance the organization and coherence of the manuscript. 

      We moved the entire paragraph to the results section of our manuscript to improve coherence. The passage was introduced with the subheading:  “Support of the main results in an independent second patient cohort”.

      References

      (1) Barginear M, Dueck AC, Allred JB, Bunnell C, Cohen HJ, Freedman RA, et al. Age and the Risk of Paclitaxel-Induced Neuropathy in Women with Early-Stage Breast Cancer (Alliance A151411): Results from 1,881 Patients from Cancer and Leukemia Group B (CALGB) 40101. Oncologist. 2019;24(5):617-23.

      (2) Mauri D, Kamposioras K, Tsali L, Bristianou M, Valachis A, Karathanasi I, et al. Overall survival benefit for weekly vs. three-weekly taxanes regimens in advanced breast cancer: A metaanalysis. Cancer Treat Rev. 2010;36(1):69-74.

      (3) Budd GT, Barlow WE, Moore HC, Hobday TJ, Stewart JA, Isaacs C, et al. SWOG S0221: a phase III trial comparing chemotherapy schedules in high-risk early-stage breast cancer. J Clin Oncol. 2015;33(1):58-64.

      (4) Lötsch J, and Ultsch A. Pitfalls of Using Multinomial Regression Analysis to Identify ClassStructure-Relevant Variables in Biomedical Data Sets: Why a Mixture of Experts (MOE) Approach Is Better. BioMedInformatics. 2023;3(4):869-84.

      (5) Kruskal WH, and Wallis WA. Use of Ranks in One-Criterion Variance Analysis. J Am Stat Assoc. 1952;47(260):583-621.

      (6) Kramer R, Bielawski J, Kistner-Griffin E, Othman A, Alecu I, Ernst D, et al. Neurotoxic 1deoxysphingolipids and paclitaxel-induced peripheral neuropathy. FASEB J. 2015;29(11):4461-72.

      (7) Field JJ, Diaz JF, and Miller JH. The binding sites of microtubule-stabilizing agents. Chem Biol. 2013;20(3):301-15.

      (8) Janes K, Little JW, Li C, Bryant L, Chen C, Chen Z, et al. The development and maintenance of paclitaxel-induced neuropathic pain require activation of the sphingosine 1-phosphate receptor subtype 1. J Biol Chem. 2014;289(30):21082-97.

      (9) Chua KC, Xiong C, Ho C, Mushiroda T, Jiang C, Mulkey F, et al. Genomewide Meta-Analysis Validates a Role for S1PR1 in Microtubule Targeting Agent-Induced Sensory Peripheral Neuropathy. Clin Pharmacol Ther. 2020;108(3):625-34.

      (10) Kawakami K, Chiba T, Katagiri N, Saduka M, Abe K, Utsunomiya I, et al. Paclitaxel increases high voltage-dependent calcium channel current in dorsal root ganglion neurons of the rat. J Pharmacol Sci. 2012;120(3):187-95.

      (11) Pittman SK, Gracias NG, Vasko MR, and Fehrenbacher JC. Paclitaxel alters the evoked release of calcitonin gene-related peptide from rat sensory neurons in culture. Exp Neurol. 2013.

      (12) Luo H, Liu HZ, Zhang WW, Matsuda M, Lv N, Chen G, et al. Interleukin-17 Regulates NeuronGlial Communications, Synaptic Transmission, and Neuropathic Pain after Chemotherapy.

      Cell reports. 2019;29(8):2384-97 e5.

      (13) Pease-Raissi SE, Pazyra-Murphy MF, Li Y, Wachter F, Fukuda Y, Fenstermacher SJ, et al. Paclitaxel Reduces Axonal Bclw to Initiate IP3R1-Dependent Axon Degeneration. Neuron. 2017;96(2):373-86 e6.

      (14) Duggett NA, Griffiths LA, and Flatters SJL. Paclitaxel-induced painful neuropathy is associated with changes in mitochondrial bioenergetics, glycolysis, and an energy deficit in dorsal root ganglia neurons. Pain. 2017.

      (15) Li Y, Adamek P, Zhang H, Tatsui CE, Rhines LD, Mrozkova P, et al. The Cancer Chemotherapeutic Paclitaxel Increases Human and Rodent Sensory Neuron Responses to TRPV1 by Activation of TLR4. J Neurosci. 2015;35(39):13487-500.

      (16) Hara T, Chiba T, Abe K, Makabe A, Ikeno S, Kawakami K, et al. Effect of paclitaxel on transient receptor potential vanilloid 1 in rat dorsal root ganglion. Pain. 2013;154(6):882-9.

      (17) Jardin I, Lopez JJ, Diez R, Sanchez-Collado J, Cantonero C, Albarran L, et al. TRPs in Pain Sensation. Front Physiol. 2017;8:392.

      (18) Julius D. TRP Channels and Pain. Annual review of cell and developmental biology.

      2013;29:355-84.

      (19) Kohonen T. Self-Organized Formation of Topologically Correct Feature Maps. Biol Cybern. 1982;43(1):59-69.

      (20) Lötsch J, Lerch F, Djaldetti R, Tegder I, and Ultsch A. Identification of disease-distinct complex biomarker patterns by means of unsupervised machine-learning using an interactive R toolbox (Umatrix). Big Data Analytics. 2018;3(1):5.

      (21) Ultsch A. 2003.

      (22) Lotsch J, Geisslinger G, Heinemann S, Lerch F, Oertel BG, and Ultsch A. Quantitative sensory testing response patterns to capsaicin- and ultraviolet-B-induced local skin hypersensitization in healthy subjects: a machine-learned analysis. Pain. 2018;159(1):11-24.

      (23) Lötsch J, Thrun M, Lerch F, Brunkhorst R, Schiffmann S, Thomas D, et al. Machine-Learned Data Structures of Lipid Marker Serum Concentrations in Multiple Sclerosis Patients Differ from Those in Healthy Subjects. Int J Mol Sci. 2017;18(6).

      (24) Lötsch J, and Ultsch A. Cham: Springer International Publishing; 2014:249-57.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      Summary:

      The authors propose that the energy landscape of animals can be thought of in the same way as the fundamental versus realized niche concept in ecology. Namely, animals will use a subset of the fundamental energy landscape due to a variety of factors. The authors then show that the realized energy landscape of eagles increases with age as the animals are better able to use the energy landscape. Strengths:

      This is a very interesting idea and that adds significantly to the energy landscape framework. They provide convincing evidence that the available regions used by birds increase with size.

      Weaknesses:

      Some of the measures used in the manuscript are difficult to follow and there is no mention of the morphometrics of birds or how these change with age (other than that they don’t change which seems odd as surely they grow). Also, there may need to be more discussion of other ontogenetic changes such as foraging strategies, home range size etc.

      We thank reviewer 1 for their interest in our study and for their constructive recommendations. We have included further discussions of these points in the manuscript and outline these changes in our responses to the detailed recommendations below.

      Reviewer 2 (Public Review):

      Summary:

      With this work, the authors tried to expand and integrate the concept of realized niche in the context of movement ecology by using fine-scale GPS data of 55 juvenile Golden eagles in the Alps. Authors found that ontogenic changes influence the percentage of area flyable to the eagles as individuals exploit better geographic uplifts that allow them to reduce the cost of transport.

      Strengths:

      Authors made insightful work linking changes in ontogeny and energy landscapes in large soaring birds. It may not only advance the understanding of how changes in the life cycle affect the exploitability of aerial space but also offer valuable tools for the management and conservation of large soaring species in the changing world.

      Weaknesses:

      Future research may test the applicability of the present work by including more individuals and/or other species from other study areas.

      We are thankful to reviewer 2 for their encouragement and positive assessment of our work. We have addressed their specific recommendations below.

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      I found this to be a very interesting paper which adds some great concepts and ideas to the energy landscape framework. The paper is also concise and well-written. While I am enthusiastic about the paper there are areas that need clarifying or need to be made clearer. Specific comments below:

      Line 64: I disagree that competition is the fundamental driver of the realized niche. In some cases, it may be but in others, predation may be far more important (as an example).

      We agree with this point and have now clarified that competition is an example of a driver of the realized niche. We have also included predation as another example:

      "However, just as animals do not occupy the entirety of their fundamental Hutchinsonian niche in reality [1], for example due to competition or predation risk, various factors can contribute to an animal not having access to the entirety of its fundamental movement niche."

      Intro: I think the authors should emphasize that morphological changes with ontogeny will change the energy landscape for many animals. It may not be the case specifically with eagles but that won’t be true for other animals. For example, in many sharks, buoyancy increases with age.

      We agree and have now clarified that the developmental processes that we are interested in happen in addition to morphological changes:

      "In addition to morphological changes, as young animals progress through their developmental stages, their movement proficiency [2] and cognitive capabilities [3] improve and memory manifests [4]."

      Line 91-93: The idea that birds fine-tune motor performance to take advantage of updrafts is a very important one to the manuscript and should be discussed in a bit more detail. How? At the moment there is a single sentence and it doesn’t even have a citation yet this is the main crux of the changes in realized energy landscape with age. This point should be emphasized because, by the end of the introduction, it is not clear to me why the landscape should be cheaper as the birds age?

      Thank you for pointing out this missing information. We have now added examples to clarify how soaring birds fine-tune their motor performance when soaring. These include for example adopting high bank angles in narrow and weak thermals [5] and reducing gliding airspeed when the next thermal has not been detected [6]:

      "Soaring flight is a learned and acquired behavior [7, 8], requiring advanced cognitive skills to locate uplifts as well as fine-tuned locomotor skills for optimal adjustment of the body and wings to extract the most energy from them, for example by adopting high bank angles in narrow and weak thermals [5] and reducing gliding airspeed when the next thermal has not been detected [6]."

      Results:

      Line 106: explain the basics of the life history of the birds in the introduction. I have no idea what emigration refers to or the life history of these animals.

      Thank you for pointing out the missing background information. We have now added this

      information to the introduction:

      "We analyzed 46,000 hours of flight data collected from bio-logging devices attached to 55 wild-ranging golden eagles in the Central European Alps. These data covered the transience phase of natal dispersal (hereafter post-emigration). In this population, juveniles typically achieve independence by emigrating from the parental territory within 4-10 months after fledging. However, due to the high density of eagles and consequently the scarcity of available territories, the transience phase between emigration and settling by eventually winning over a territory is exceptionally long at well over 4 years. Our hypothesis posited that the realized energy landscape during this transience phase gradually expands as the birds age."

      What I still am having a hard time understanding is the flyability index. Is this just a measure of the area animals actively select and then the assumption that it’s a good region to fly within?

      We have modified our description of the flyability index for more clarity. In short, we built a step-selection model and made predictions using this model. The predictions estimate the probability of use of an area based on the predictors of the model. For the purpose of our study and what our predictors were (proxies for uplift + movement capacity), we interpreted the predicted values as the "flyability index". We have now clarified this in the methods section:

      "We made the predictions on the scale of the link function and converted them to values between 0 and 1 using the inverse logit function [9]. These predicted values estimated the probability of use of an area for flying based on the model. We interpreted these predicted values as the flyability index, representing the potential energy available in the landscape to support flight, based on the uplift proxies (TRI and distance to ridge line) and the movement capacity (step length) of the birds included in the model."

      It might also be useful to simply show the changes in the area the animals use with age as well (i.e. a simple utilization distribution). This should increase in age for many animals but would also be a reflection of the resources animals need to acquire as they get older.

      We have now added the figure S2 to the supplementary material. This plot was created by calculating the cumulative area used by the birds in each week after emigration. This was done by extracting the commuting flights for each week, converting these to line objects, overlapping the lines with a raster of 100*100 m cell size, counting the number of overlapping cells and calculating the area that they covered. We did not calculate UDs or MCPs because the eagles seem to be responding to linear features of the landscape, e.g. preferring ridgelines and avoiding valleys. Using polygons to estimate used areas would have made it difficult to ensure that decision-making with regards to these linear features was captured.

      In a follow-up project, a PhD student in the golden eagle consortium is exploring the individuals’ space use after emigration considering different environmental and social factors. The outcome of that study will further complete our understanding of the post-emigration behavior of juvenile golden eagles in the Alps.

      How much do the birds change in size over the ontogeny measured? This is never discussed.

      Thank you for bringing up this question. The morphometrics of juvenile golden eagles are not significantly different from the adults, except in the size of culmen and claws [10]. Body mass changes after fledging, because of the development of the pectoral muscles as the birds start flying. Golden eagles typically achieve adult-like size and mass within their natal territory before emigration, at which time we started quantifying the changes in energy landscape. Given our focus on post-emigration flight behavior, we do not expect any significant changes in size and body mass during our study period. We now cover this in the discussion:

      "Juvenile golden eagles complete their morphological development before gaining independence from their parents, with their size and wing morphology remaining stable during the post-emigration phase [10, 11]. Consequently, variations in flyability of the landscape for these birds predominantly reflect their improved mastery of soaring flight, rather than changes in their morphology."

      Discussion:

      Line 154: Could the increase in step length also be due to changes in search strategies with age? e.g. from more Brownian motion when scavenging to Levy search patterns when actively hunting?

      This is a very good point and we tried to look for evidence of this transition in the tracking data. We explored the first passage time for two individuals with a radius of 50 km to see if there is a clear transition from a Brownian to a Levy motion. The patterns that emerge are inconclusive and seem to point to seasonality rather than a clear transition in foraging strategy (Author response image 1). We have modified our statement in the discussion about the change in preference of step lengths indicating improve flight ability, to clarify that it is speculative:

      Author response image 1.

      First passage times using a 50 km radius for two randomly selected individuals.

      "Our findings also reveal that as the eagles aged, they adopted longer step lengths, which could indicate an increasing ability to sustain longer uninterrupted flight bouts."

      Methods:

      Line 229: What is the cutoff for high altitude or high speed?

      We used the Expectation-maximization binary clustering (EMbC) method to identify commuting flights. The EmbC method does not use hard cutoffs to cluster the data. Each data point was assigned to the distribution to which it most likely belonged based on the final probabilities after multiple iterations of the algorithm. Author response image 2 shows the distribution of points that were either used or not used based on the EmbC classification.

      Author response image 2.

      Golden eagle tracking points were either retained (used) or discarded (not used) for further data analysis based on the EmbC algorithm. The point were clustered based on ground speed and height above ground.

      Figure 1: The figure captions should stand on their own but in this case there is no information as to what the tests are actually showing.

      We have now updated the caption to provide information about the model:

      "Coefficient estimates of the step selection function predicting probability of use as a function of uplift proxies, week since emigration, and step length. All variables were z-transformed prior to modeling.

      The error bars show 95% confidence intervals."

      Reviewer 2 (Recommendations For The Authors):

      First, I want to congratulate you on this fantastic work. I enjoyed reading it. The manuscript is clear and well-written, and the findings are sound and relevant to the field of movement ecology. Also, the figures are neatly presented and easy to follow.

      I particularly liked expanding the old concept of fundamental vs realized niche into a movement ecology context. I believe that adds a fresh view into these widely accepted ecological assumptions on species niche, which may help other researchers build upon them to better understand movement "realms" on highly mobile animals in a rapidly changing world.

      I made some minor comments to the manuscript since it was hard to find important weaknesses in it, given the quality of your work. However, there was a point in the discussion that I feel deserves your attention (or rather a reflection) on how major biological events such as moulting could also influence birds to master the flying and exploitation of the energy landscape. You may find my suggestion quite subjective, but I think it may help expand your idea for future works and, what is more, link concepts such as energy landscapes, ontogeny, and important life cycle events such as moulting in large soaring birds. I consider this relevant from a mechanistic perspective to understand better how individuals negotiate all three concepts to thrive and persist in changing environments and to maximise their

      fitness.

      Once again, congratulations on this excellent piece of research.

      We thank the reviewer for their enthusiasm about our work and for bringing up important points about the biology of the species. Our detailed response are below.

      MINOR COMMENTS:

      (Note: Line numbers refer to those in the PDF version provided by the journal).

      Line 110: Distinguished (?)

      corrected

      Line 131: Overall, I agree with the authors’ discussion and very much liked how they addressed crucial points. However, I have a point about some missing non-discussed aspects of bird ecology that had not been mentioned.

      The authors argue that morphological traits are less important in explaining birds’ mastery of flight (thus exploiting all available options in the landscape). However, I think the authors are missing some fundamental aspects of bird biology that are known to affect birds’ flying skills, such as moult.

      The moulting process affects species’ flying capacity. Although previous works have not assessed moults’ impact on movement capacity, I think it is worth including the influence of flyability on this ecologically relevant process.

      For instance, golden eagles change their juvenile plumage to intermediate, sub-adult plumage in two or three moult cycles. During this process, the moulting process is incomplete and affects the birds’ aerodynamics, flying capacity, and performance (see Tomotani et al. 2018; Hedenström 2023). Thus, one could expect this process to be somewhat indirectly linked to the extent to which birds can exploit available resources.

      Hedenström, A. (2023). Effects of wing damage and moult gaps on vertebrate flight performance.

      Journal of Experimental Biology, 226(9), jeb227355. Tomotani, B. M., Muijres, F. T., Koelman, J., Casagrande, S., & Visser, M. E. (2018). Simulated moult reduces flight performance, but overlap with breeding does not affect breeding success in a longdistance migrant. Functional Ecology, 32(2), 389-401.

      We thank the reviewer for bringing up this relevant topic. We explored the literature listed by the reviewer and also other sources. We came to the conclusion that moulting does not impact our findings. In our study, we included data for eagles that had emigrated from the natal territories, with their fully grown feathers in juvenile plumage. The moulting schedule in juvenile birds is similar to that of adults: the timing, intensity, and sequence of feathers being replaced is consistent every year (Author response image 3). For these reasons, we do not believe that moulting stage noticeably impacts flight performance at the scale of our study (hourly flights). Fine details of soaring flight performance (aerodynamics within and between thermals) could differs during moulting of different primary and secondary feathers, but this is something that would occur every time the eagle replaces these feather and we do not expect it to be any different for juveniles. Such fine scale investigations are outside the scope of this study.

      Author response image 3.

      Moulting schedule of golden eagles [12]

      Lines 181-182: I don’t think trophic transitions rely only on individual flying skill changes. Furthermore, despite its predominant role, scavenging does not mean it is the primary source of food acquisition in golden eagles. This also depends on prey availability, and scavenging is an auxiliary font of easy-to-catch food.

      Scavenging implies detecting carcasses. Should this carcass appearance occur in highly rugged areas, the likelihood of detection also reduces notably. This is not to say that there are not more specialized carrion consumers, such as vultures, that may outcompete eagles in searching for such resources more

      efficiently.

      In summary, I don‘t think such transition relies only on flying skills but on other non-discussed factors such as knowledge accumulation of the area or even the presence of conspecifics.

      Line 183: This is precisely what I meant with my earlier comment.

      Thank you for the discussion on the interaction between flight development and foraging strategy. We explored the transition from scavenging to hunting above as a response to Reviewer 1, but did not find a clear transition. This is in line with your comment that the birds probably use both scavenging and hunting methods opportunistically.

      Lines 193-195: I will locate this sentence somewhere in this paragraph. As it is now, it seems a bit out of context. It could be a better fit at the end of the first point in line 203.

      Thank you for pointing out the issue with the flow. We have now added a transitional sentence before this one to improve the paragraph. The beginning of the conclusion now reads as follows, with the new sentence shown in boldface.

      "Spatial maps serve as valuable tools in informing conservation and management strategies by showing the general distribution and movement patterns of animals. These tools are crucial for understanding how animals interact with their environment, including human-made structures. Within this context, energy landscapes play an important role in identifying potential areas of conflict between animals and anthropogenic infrastructures such as wind farms. The predictability of environmental factors that shape the energy landscape has facilitated the development of these conservation tools, which have been extrapolated to animals belonging to the same ecological guild traversing similar environments."

      References

      (1) Colwell, R. K. & Rangel, T. F. Hutchinson’s duality: The once and future niche. Proceedings of the National Academy of Sciences 106, 19651–19658. doi:10.1073/pnas.0901650106 (2009).

      (2) Corbeau, A., Prudor, A., Kato, A. & Weimerskirch, H. Development of flight and foraging behaviour in a juvenile seabird with extreme soaring capacities. Journal of Animal Ecology 89, 20–28. doi:10.1111/1365-2656.13121 (2020).

      (3) Fuster, J. M. Frontal lobe and cognitive development. Journal of neurocytology 31, 373–385.

      doi:10.1023/A:1024190429920 (2002).

      (4) Ramsaran, A. I., Schlichting, M. L. & Frankland, P. W. The ontogeny of memory persistence and specificity. Developmental Cognitive Neuroscience 36, 100591. doi:10.1016/j.dcn.2018.09.002 (2019).

      (5) Williams, H. J., Duriez, O., Holton, M. D., Dell’Omo, G., Wilson, R. P. & Shepard, E. L. C. Vultures respond to challenges of near-ground thermal soaring by varying bank angle. Journal of Experimental Biology 221, jeb174995. doi:10.1242/jeb.174995 (Dec. 2018).

      (6) Williams, H. J., King, A. J., Duriez, O., Börger, L. & Shepard, E. L. C. Social eavesdropping allows for a more risky gliding strategy by thermal-soaring birds. Journal of The Royal Society Interface 15, 20180578. doi:10.1098/rsif.2018.0578 (2018).

      (7) Harel, R., Horvitz, N. & Nathan, R. Adult vultures outperform juveniles in challenging thermal soaring conditions. Scientific reports 6, 27865. doi:10.1038/srep27865 (2016).

      (8) Ruaux, G., Lumineau, S. & de Margerie, E. The development of flight behaviours in birds. Proceedings of the Royal Society B: Biological Sciences 287, 20200668. doi:10.1098/rspb.2020.

      0668 (2020).

      (9) Bolker, B., Warnes, G. R. & Lumley, T. Package gtools. R Package "gtools" version 3.9.4 (2022).

      (10) Bortolotti, G. R. Age and sex size variation in Golden Eagles. Journal of Field Ornithology 55,

      54–66 (1984).

      (11) Katzner, T. E., Kochert, M. N., Steenhof, K., McIntyre, C. L., Craig, E. H. & Miller, T. A. Birds of the World (eds Rodewald, P. G. & Keeney, B. K.) chap. Golden Eagle (Aquila chrysaetos), version 2.0. doi:10.2173/bow.goleag.02 (Cornell Lab of Ornithology, Ithaca, NY, USA, 2020).

      (12) Bloom, P. H. & Clark, W. S. Molt and sequence of plumages of Golden Eagles and a technique for in-hand ageing. North American Bird Bander 26, 2 (2001).

    1. Author response:

      (1) Clarification and Detailed Explanation in the Methods Section:

      - Regarding Reviewer 1's comments about the unclear explanation of the update process for pseudotime, T, and the selection of important genes/features at bifurcation points in the methods, we will provide a detailed description of the update process for pseudotime T and how high-weight genes important to the bifurcation process are selected.

      - Regarding Reviewer 2's comments concerning the impact of the initial pseudotime prediction method and the insufficient description of various parameters, we will add information about the differences in the initially used pseudotime prediction methods and provide detailed information on the techniques and parameters used in each analysis.

      - Regarding Reviewer 2's comments on the choice of kernel functions, we will explain the rationale for selecting rbf and polynomial kernels and why other options were discarded.

      (2) Performance Comparison and Data Presentation:

      - Regarding Reviewer 1's comments about using a few trajectory plots of the real-world data to visualize the results, we will include 1-2 trajectory plots of real-world datasets in the benchmark analysis to better visualize the results and assess accuracy.

      - Regarding Reviewer 2's comments concerning the lack of comparison results and discussion related to trajectory prediction methods based on deep learning, we will include a comparison with deep learning methods such as scTour and Tigon in the revision. Additionally, we will discuss the latest deep learning methods for bifurcation analysis and alternative trajectory inference methods such as CellRank.

      - Regarding Reviewer 2's comments on the impact of MURP, we will include an analysis on whether the number of MURPs affects the performance of the method and compare it with the random subsampling approach.

      (3) Article Calibration and Refinement:

      - Regarding Reviewer 2's comments on the discussion section, we will simplify the first three paragraphs to succinctly convey the background and implications of our contributions. Additionally, we will explain why HVG is considered as the entire feature space in our comparisons and analyses.

      - Regarding Reviewer 2's comments concernig the regulons in the microglia analysis, we will review the correct explanations and revise the article accordingly.

      - In response to the issues raised by both reviewers regarding grammatical errors, spelling mistakes, and inconsistencies between text and figures, we will review and correct any errors in the article. This includes providing explanations for all abbreviations upon their first appearance, ensuring the accuracy of text and figure descriptions, correcting equation numbering, improving image quality, and revising descriptions such as "the current manifold learning methods face two major challenges."

      (4) Enhancing Descriptions and Readability:

      - Regarding Reviewer 1's comments about the synthetic data, we will add a brief description in the main text on how synthetic data were generated.

      - Regarding Reviewer 1's comments on the survival analysis, we will provide a more detailed description of the computational steps and clarify whether key confounding factors such as age, clinical stage, and tumor purity were controlled.

      - Regarding Reviewer 2's comments on evaluation metrics, we will add detailed descriptions of the evaluation metrics and provide intuitive explanations of how different methods perform across various metrics in the comparison results.

      - Regarding Reviewer 2's comments on CD8+ T cells, we plan to compare MGPfact with Monocle3, in addition to Monocle2. This will help clarify the added value of MGPfact and provide a more comprehensive evaluation of its performance.

      - Regarding Reviewer 2's comments about consensus trajectorie, we will add detailed descriptions of the process of generating consensus trajectories.

      - Regarding Reviewer 2's comments on regulons, we will include additional information on the process of downstream trajectory analysis and clarify the roles of SCENIC, GENIE3, RCisTarget, and AUCell in the bifurcation analysis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Preliminary note from the Reviewing Editor:

      The evaluations of the two Reviewers are provided for your information. As you can see, their opinions are very different.

      Reviewer #1 is very harsh in his/her evaluation. Clearly, we don't expect you to be able to affect one type of actin network without affecting the other, but rather to change the balance between the two. However, he/she also raises some valid points, in particular that more rationale should be added for the perturbations (also mentioned by Reviewer #2). Both Reviewers have also excellent suggestions for improving the presentation of the data.

      We sincerely appreciate your and the reviewers’ suggestions. The comments are amended accordingly.

      On another point, I was surprised when reading your manuscript that a molecular description of chirality change in cells is presented as a completely new one. Alexander Bershadsky's group has identified several factors (including alpha-actinin) as important regulators of the direction of chirality. The articles are cited, but these important results are not specifically mentioned. Highlighting them would not call into question the importance of your work, but might even provide additional arguments for your model.

      We appreciate the editor’s comment. Alexander Bershadsky's group has done marvelous work in cell chirality. They introduced the stair-stepping and screw theory, which suggested how radial fiber polymerization generates ACW force and drives the actin cytoskeleton into the ACW pattern. Moreover, they have identified chiral regulators like alpha-actinin 1, mDia1, capZB, and profilin 1, which can reverse or neutralize the chiral expression.

      It is worth noting that Bershadsky's group primarily focuses on radial fibers. In our manuscript, instead, we primarily focused on the contractile unit in the transverse arcs and CW chirality in our investigation. Our manuscript incorporates our findings in the transverse arcs and the radial fibers theory by Bershadsky's group into the chirality balance hypothesis, providing a more comprehensive understanding of the chirality expression.

      We have included relevant articles from Alexander Bershadsky's group, we agree that highlighting these important results of chiral regulators would further strengthen our manuscript. The manuscript was revised as follows:

      “ACW chirality can be explained by the right-handed axial spinning of radial fibers during polymerization, i.e. ‘stair-stepping' mode proposed by Tee et al. (Tee et al. 2015) (Figure 8A; Video 4). As actin filament is formed in a right-handed double helix, it possesses an intrinsic chiral nature. During the polymerization of radial fiber, the barbed end capped by formin at focal adhesion was found to recruit new actin monomers to the filament. The tethering by formin during the recruitment of actin monomers contributes to the right-handed tilting of radial fibers, leading to ACW rotation. Supporting this model, Jalal et al. (Jalal et al. 2019) showed that the silencing of mDia1, capZB, and profilin 1 would abolish the ACW chiral expression or reverse the chirality into CW direction. Specifically, the silencing of mDia1, capZB or profilin-1 would attenuate the recruitment of actin monomer into the radial fiber, with mDia1 acting as the nucleator of actin filament (Tsuji et al. 2002), CapZB promoting actin polymerization as capping protein (Mukherjee et al. 2016), and profilin-1 facilitating ATP-bound G-actin to the barbed ends(Haarer and Brown 1990; Witke 2004). The silencing resulted in a decrease in the elongation velocity of radial fiber, driving the cell into neutral or CW chirality. These results support that our findings that reduction of radial fiber elongation can invert the balance of chirality expression, changing the ACW-expressing cell into a neutral or CW-expressing cell.”

      By incorporating their findings into our revision and discussion, we provide additional support for our radial fiber-transverse arc balance model for chirality expression. The revision is made on pages 8 to 9, 13, lines 253 to 256, 284, 312 to 313, 443, 449 to 459.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kwong et al. present evidence that two actin-filament based cytoskeletal structures regulate the clockwise and anticlockwise rotation of the cytoplasm. These claims are based on experiments using cells plated on micropatterned substrates (circles). Previous reports have shown that the actomyosin network that forms on the dorsal surface of a cell plated on a circle drives a rotational or swirling pattern of movement in the cytoplasm. This actin network is composed of a combination of non-contractile radial stress fibers (AKA dorsal stress fibers) which are mechanically coupled to contractile transverse actin arcs (AKA actin arcs). The authors claim that directionality of the rotation of the cytoplasm (i.e., clockwise or anticlockwise) depends on either the actin arcs or radial fibers, respectively. While this would interesting, the authors are not able to remove either actin-based network without effecting the other. This is not surprising, as it is likely that the radial fibers require the arcs to elongate them, and the arcs require the radial fibers to stop them from collapsing. As such, it is difficult to make simple interpretations such as the clockwise bias is driven by the arcs and anticlockwise bias is driven by the radial fibers.

      Weaknesses:

      (1) There are also multiple problems with how the data is displayed and interpreted. First, it is difficult to compare the experimental data with the controls as the authors do not include control images in several of the figures. For example, Figure 6 has images showing myosin IIA distribution, but Figure 5 has the control image. Each figure needs to show controls. Otherwise, it will be difficult for the reader to understand the differences in localization of the proteins shown. This could be accomplished by either adding different control examples or by combining figures.

      We appreciate the reviewer’s comment. We agree with the reviewer that it is difficult to compare our results in the current arrangement. The controls are included in the new Figure 6.

      (2) It is important that the authors should label the range of gray values of the heat maps shown. It is difficult to know how these maps were created. I could not find a description in the methods, nor have previous papers laid out a standardized way of doing it. As such, the reader needs some indication as to whether the maps showing different cells were created the same and show the same range of gray levels. In general, heat maps showing the same protein should have identical gray levels. The authors already show color bars next to the heat maps indicating the range of colors used. It should be a simple fix to label the minimum (blue on the color bar) and the maximum (red on the color bar) gray levels on these color bars. The profiles of actin shown in Figure 3 and Figure 3- figure supplement 3 were useful for interpretating the distribution of actin filaments. Why did not the authors show the same for the myosin IIa distributions?

      We appreciate the reviewer’s comment. For generating the distribution heatmap, the images were taken under the same setting (e.g., fluorescent staining procedure, excitation intensity, or exposure time). The prerequisite of cells for image stacking was that they had to be fully spread on either 2500 µm2 or 750 µm2 circular patterns. Then, the location for image stacking was determined by identifying the center of each cell spread in a perfect circle. Finally, the images were aligned at the cell center to calculate the averaged intensity to show the distribution heatmap on the circular pattern. Revision is made on pages 19 to 20, lines 668 to 677.

      It is important to note that the individual heatmaps represent the normalized distribution generated using unique color intensity ranges. This approach was chosen to emphasize the proportional distribution of protein within cells and its variations among samples, especially for samples with generally lower expression levels. Additionally, a differential heatmap with its own range was employed to demonstrate the normalized differences compared to the control sample. Furthermore, to provide additional insight, we plotted the intensity profile of the same protein with the same size for comparative analysis. Revision is made on pages 20, lines 679 to 682.

      The labels of the heatmap are included to show the intensity in the revised Figure 3, Figure 5, Figure 6, and Figure 3 —figure supplement 4.

      To better illustrate the myosin IIa distribution, the myosin intensity profiles were plotted for Y27 treatment and gene silencing. The figures are included as Figure 5—figure supplement 2 and Figure 6—figure supplement 2. Revisions are made on pages 10, lines 332 to 334 and pages 11, lines 377 to 379.

      (3) Line 189 "This absence of radial fibers is unexpected". The authors should clarify what they mean by this statement. The claim that the cell in Figure 3B has reduced radial stress fiber is not supported by the data shown. Every actin structure in this cell is reduced compared to the cell on the larger micropattern in Figure 3A. It is unclear if the radial stress fibers are reduced more than the arcs. Are the authors referring to radial fiber elongation?

      We appreciate the reviewer’s comment. We calculated the structures' pixel number and the percentage in the image to better illustrate the reduction of radial fiber or transverse arc. As radial fibers emerge from the cell boundary and point towards the cell center and the transverse arcs are parallel to the cell edge, the actin filament can be identified by their angle with respect to the cell center. We found that the pixel number of radial fiber is greatly reduced by 91.98 % on 750 µm2 compared to the 2500 µm2 pattern, while the pixel number of transverse arc is reduced by 70.58 % (Figure 3- figure supplement 3A). Additionally, we compared the percentage of actin structures on different pattern sizes (Figure 3- figure supplement 3B). On 2500 µm2 pattern, the percentage of radial fiber in the actin structure is 61.76 ± 2.77 %, but it only accounts for 31.13 ± 2.76 % while on 750 µm2 pattern. These results provide evidence of the structural reduction on a smaller pattern.

      Regarding the radial fiber elongation, we only discussed the reduction of radial fiber on 750 µm2 compared to the 2500 µm2 pattern in this part. For more understanding of the radial fiber contribution to chirality, we compared the radial fiber elongation rate in the LatA treatment and control on 2500 µm2 pattern (Figure 4). This result suggests the potential role of radial fiber in cell chirality. Revisions are made on page 6, lines 186 to 194; pages 17 to 18, 601 to 606; and the new Figure 3- figure supplement 3.

      (4) The choice of the small molecule inhibitors used in this study is difficult to understand, and their results are also confusing. For example, sequestering G actin with Latrunculin A is a complicated experiment. The authors use a relatively low concentration (50 nM) and show that actin filament-based structures are reduced and there are more in the center of the cell than in controls (Figure 3E). What was the logic of choosing this concentration?

      We appreciate the reviewer’s comment. The concentration of drugs was selected based on literatures and their known effects on actin arrangement or chiral expression.

      For example, Latrunculin A was used at 50 nM concentration, which has been proven effective in reversing the chirality at or below 50 nM (Bao et al., 2020; Chin et al., 2018; Kwong et al., 2019; Wan et al., 2011). Similarly, the 2 µM A23187 treatment concentration was selected to initiate the actin remodeling (Shao et al., 2015). Furthermore, NSC23677 at 100 µM was found to efficiently inhibit the Rac1 activation and resulted in a distinct change in actin structure (Chen et al., 2011; Gao et al., 2004), enhancing ACW chiral expression. The revision is made on pages 6 to 7, lines 202 to 211.

      (5) Using a small molecule that binds the barbed end (e.g., cytochalasin) could conceivably be used to selectively remove longer actin filaments, which the radial fibers have compared to the lamellipodia and the transverse arcs. The authors should articulate how the actin cytoskeleton is being changed by latruculin treatment and the impact on chirality. Is it just that the radial stress fibers are not elongating? There seems to be more radial stress fibers than in controls, rather than an absence of radial stress fibers.

      We appreciate the reviewer’s comment. Our results showed Latrunculin A treatment reversed the cell chirality. To compare the amount of radial fiber and transverse arc, we calculated the structures' pixel percentage. We found that, the percentage of radial fibers pixel with LatA treatment was reduced compared to that of the control, while the percentage of transverse arcs pixel increased (Figure 3— figure supplement 5). This result suggests that radial fibers are inhibited under Latrunculin A treatment.

      Furthermore, the elongation rate of radial fibers is reduced by Latrunculin A treatment (Figure 4). This result, along with the reduction of radial fiber percentage under Latrunculin A treatment suggests the significant impact of radial fiber on the ACW chirality.  Revisions are made on pages 7 to 8, lines 244 to 250 and the new Figure 3— figure supplement 5 and Figure 3— figure supplement 6.

      (6) Similar problems arise from the other small molecules as well. LPA has more effects than simply activating RhoA. Additionally, many of the quantifiable effects of LPA treatment are apparent only after the cells are serum starved, which does not seem to be the case here.

      We appreciate the reviewer’s comment. The reviewer mentioned that the quantifiable effects of LPA treatments were seen after the cells were serum-starved. LPA is known to be a serum component and has an affinity to albumin in serum (Moolenaar, 1995). Serum starvation is often employed to better observe the effects of LPA by comparing conditions with and without LPA. We agree with the reviewer that the effect of LPA cannot be fully seen under the current setting. Based on the reviewer’s comment and after careful consideration, we have decided to remove the data related to LPA from our manuscript. Revisions are made on pages 6 to 7, 17 and Figure 3— figure supplement 4.

      (7) Furthermore, inhibiting ROCK with, Y-27632, effects myosin light chain phosphorylation and is not specific to myosin IIA. Are the two other myosin II paralogs expressed in these cells (myosin IIB and myosin IIC)? If so, the authors’ statements about this experiment should refer to myosin II not myosin IIa.

      We appreciate the reviewer’s comment. We agree that ensuring accuracy and clarity in our statements is important. The terminology is revised to myosin II regarding the Y27632 experiment for a more concise description. Revision is made on pages 9 to 10 and 29, lines 317 to 341, 845 and 848.  

      (8) None of the uses of the small molecules above have supporting data using a different experimental method. For example, backing up the LPA experiment by perturbing RhoA tho.

      We appreciate the reviewer’s comment. After careful consideration, we have decided to remove the data related to LPA from our manuscript. Revisions are made on pages 6 to 7, 17 and Figure 3— figure supplement 4.

      (9) The use of SMIFH2 as a "formin inhibitor" is also problematic. SMIFH2 also inhibits myosin II contractility, making interpreting its effects on cells difficult to impossible. The authors present data of mDia2 knockdown, which would be a good control for this SMIFH2.

      We appreciate the reviewer’s comment. We agree that there is potential interference of SMIFH2 with myosin II contractility, which could introduce confounding factors to the results. Based on your comment and further consideration, we have decided to remove the data related to SMIFH2 from our manuscript. Revisions are made on pages 6 to 7, 10, 17 and Figure 3— figure supplement 4.

      (10) However, the authors claim that mDia2 "typically nucleates tropomyosin-decorated actin filaments, which recruit myosin II and anneal endwise with α-actinin- crosslinked actin filaments."

      There is no reference to this statement and the authors own data shows that both arcs and radial fibers are reduced by mDia2 knockdown. Overall, the formin data does not support the conclusions the authors report.

      We appreciate the reviewer’s comment. We apologize for the lack of citation for this claim. To address this, we have added a reference to support this claim in the revised manuscript (Tojkander et al., 2011). Revision is made on page 10, line 345 to 347.

      Regarding the actin structure of mDia2 gene silencing, our results showed that myosin II was disassociated from the actin filament compared to the control. At the same time, there is no considerable differences in the actin structure of radial fibers and transverse arcs between the mDia2 gene silencing and the control.  

      (11) The data in Figure 7 does not support the conclusion that myosin IIa is exclusively on top of the cell. There are clear ventral stress fibers in A (actin) that have myosin IIa localization. The authors simply chose to not draw a line over them to create a height profile.

      We appreciate the reviewer’s comment. To better illustrate myosin IIa distribution in a cell, we have included a video showing the myosin IIa staining from the base to the top of the cell (Video 7). At the cell base, the intensity of myosin IIa is relatively low at the center. However, when the focal plane elevates, we can clearly see the myosin II localizes near the top of the cell (Figure 7B and Video 7). Revision is made on page 12, lines 421 to 424, and the new Video 7. 

      Reviewer #2 (Public Review):

      Summary:

      Chirality of cells, organs, and organisms can stem from the chiral asymmetry of proteins and polymers at a much smaller lengthscale. The intrinsic chirality of actin filaments (F-actin) is implicated in the chiral arrangement and movement of cellular structures including F-actin-based bundles and the nucleus. It is unknown how opposite chiralities can be observed when the chirality of F-actin is invariant. Kwong, Chen, and co-authors explored this problem by studying chiral cell-scale structures in adherent mammalian cultured cells. They controlled the size of adhesive patches, and examined chirality at different timepoints. They made various molecular perturbations and used several quantitative assays. They showed that forces exerted by antiparallel actomyosin bundles on parallel radial bundles are responsible for the chirality of the actomyosin network at the cell scale.

      Strengths:

      Whereas previously, most effort has been put into understanding radial bundles, this study makes an important distinction that transverse or circumferential bundles are made of antiparallel actomyosin arrays. A minor point that was nice for the paper to make is that between the co-existing chirality of nuclear rotation and radial bundle tilt, it is the F-actin driving nuclear rotation and not the other way around. The paper is clearly written.

      Weaknesses:

      The paper could benefit from grammatical editing. Once the following Major and Minor points are addressed, which may not require any further experimentation and does not entail additional conditions, this manuscript would be appropriate for publication in eLife.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Major:

      (1) The binary classification of cells as exhibiting clockwise or anticlockwise F-actin structures does not capture the instances where there is very little chirality, such as in the mDia2-depleted cells on small patches (Figure 6B). Such reports of cell chirality throughout the cell population need to be reported as the average angle of F-actin structures on a per cell basis as a rose plot or scatter plot of angle. These changes to cell-scoring and data display will be important to discern between conditions where chirality is random (50% CW, 50% ACW) from conditions where chirality is low (radial bundles are radial and transverse arcs are circumferential).

      We appreciate the reviewer’s comment. We apologize if we did not convey our analysis method clearly enough. Throughout the manuscript, unless mentioned otherwise, the chirality analysis was based on the chiral nucleus rotation within a period of observation. The only exception is the F-actin structure chirality, in Figure 3—figure supplement 1, which we analyzed the angle of radial fiber of the control cell on 2500 µm2. It was described on pages 5 to 6, lines 169-172, and the method section “Analysis of fiber orientation and actin structure on circular pattern” on page 17.

      Based on the feedback, we attempted to use a scatter plot to present the mDia2 overexpression and silencing to show the randomness of the result. However, because scatter plots primarily focus on visualizing the distribution, they become cluttered and visually overwhelming, as shown below.

      Author response image 1.

      (A) Percentage of ACW nucleus rotational bias on 2500 µm2 with untreated control (reused data from Figure 3D, n = 57), mDia2 silencing (n = 48), and overexpression (n = 25). (B) Probability of ACW/CW rotation on 750 µm2 pattern with untreated control (reused data from Figure 3E, n = 34), mDia2 silencing (n = 53), and overexpressing (n = 22). Mean ± SEM. Two-sample equal variance two-tailed t-test.

      Therefore, in our manuscript, the presentation primarily used a column bar chart with statistical analysis, the Student T-test. The column bar chart makes it easier to understand and compare values. In brief, the Student T-test is commonly used to evaluate whether the means between the two groups are significantly different, assuming equal variance. As such, the Student T-test is able to discern the randomness of the chirality.

      (2) The authors need to discuss the likely nucleator of F-actin in the radial bundles, since it is apparently not mDia2 in these cells.

      We appreciate the reviewer’s comment. In our manuscript, we originally focused on mDia2 and Tpm4 as they are the transverse arc nucleator and the mediator of myosin II motion. However, we agree with the reviewer that discussing the radial fiber nucleator would provide more insight into radial fiber polymerization in ACW chirality and improve the completeness of the story.

      Radial fiber polymerizes at the focal adhesion. Serval proteins are involved in actin nucleation or stress fiber formation at the focal adhesion, such as Arp2/3 complex (Serrels et al., 2007), Ena/VASP (Applewhite et al., 2007; Gateva et al., 2014), and formins (Dettenhofer et al., 2008; Sahasrabudhe et al., 2016; Tsuji et al., 2002), etc. Within the formin family, mDia1 is the likely nucleator of F-actin in the radial bundle. The presence of mDia1 facilitates the elongation of actin bundles at focal adhesion (Hotulainen and Lappalainen, 2006). Studies by Jalal, et al (2019) (Jalal et al., 2019) and Tee, et al (2023) (Tee et al., 2023), have demonstrated the silencing of mDia1 abolished the ACW actin expression. Silencing of other nucleation proteins like Arp2/3 complex or Ena/VASP would only reduce the ACW actin expression without abolishing it.

      Based on these findings, the attenuation of radial fiber elongation would abolish the ACW chiral expression, providing more support for our model in explaining chirality expression.

      This part is incorporated into the Discussion. The revision is made on page 13, lines 443, 449 to 459.

      Minor:

      (1) In the introduction, additional observations of handedness reversal need to be referenced (line 79), including Schonegg, Hyman, and Wood 2014 and Zaatri, Perry, and Maddox 2021.

      We appreciate the reviewer’s comment. The observations of handedness reversal references are cited on page 3, line 78 to 79.

      (2) For clarity of logic, the authors should share the rationale for choosing, and results from administering, the collection of compounds as presented in Figure 3 one at a time instead of as a list.

      We appreciate the reviewer’s comment. The concentration of drugs was determined based on existing literature and their known outcomes on actin arrangement or chiral expression.

      To elucidate, the use of Latrunculin A was based on previous studies, which have demonstrated to reverse the chirality at or below 50 nM (Bao et al., 2020; Chin et al., 2018; Kwong et al., 2019; Wan et al., 2011).  Because inhibiting F-actin assembly can lead to the expression of CW chirality, we hypothesized that the opposite treatment might enhance ACW chirality. Therefore, we chose A23187 treatment with 2 µM concentration as it could initiate the actin remodeling and stress fiber formation (Shao et al., 2015).

      Furthermore, in the attempt to replicate the reversal of chirality by inhibiting F-actin assembly through other pathways, we explored NSC23677 at 100 µM, which was found to inhibit the Rac1 activation (Chen et al., 2011; Gao et al., 2004) and reduce cortical F-actin assembly (Head et al., 2003). However, it failed to reverse the chirality but enhanced the ACW chirality of the cell.

      We carefully selected the drugs and the applied concentration to investigate various pathways and mechanisms that influence actin arrangement and might affect the chiral expression. We believe that this clarification strengthens the rationale behind our choice of drug. The revision is made on pages 6 to 7, lines 202 to 211.

      (3) "Image stacking" isn't a common term to this referee. Its first appearance in the main text (line 183) should be accompanied with a call-out to the Methods section. The authors could consider referring to this approach more directly. Related issue: Image stacking fails to report the prominent enrichment of F-actin at the very cell periphery (see Figure 3 A and F) except for with images of cells on small islands (Figure 3H). Since this data display approach seems to be adding the intensity from all images together, and since cells on circular adhesive patches are relatively radially symmetric, it is unclear how to align cells, but perhaps cells could be aligned based on a slight asymmetry such as the peripheral location with highest F-actin intensity or the apparent location of the centrosome.

      We appreciate the reviewer’s comment. We fully acknowledge the uncommon use of “image stacking” and the insufficient description of image stacking under the Method section. First, we have added a call-out to the Methods section at its first appearance (Page 6, Lines 182 to 183). The method of image stacking is as follows. During generating the distribution heatmap, the images were taken under the same setting (e.g., staining procedure, fluorescent intensity, exposure time, etc.). The prerequisite of cells to be included in image stacking was that they had to be fully spread on either 2500 µm2 or 750 µm2 circular patterns. Then, the consistent position for image stacking could be found by identifying the center of each cell spreading in a perfect circle. Finally, the images were aligned at the center to calculate the averaged intensity to show the distribution heatmap on the circular pattern.

      We agree with the reviewer that our image alignment and stacking are based on cells that are radially symmetric. As such, the intensity distribution of stacked image is to compare the difference of F-actin along the radial direction. Revision is made on page 19, lines 668 to 682.

      (4) The authors need to be consistent with wording about chirality, avoiding "right" and left (e.g. lines 245-6) since if the cell periphery were oriented differently in the cropped view, the tilt would be a different direction side-to-side but the same chirality. This section is confusing since the peripheral radial bundles are quite radial, and the inner ones are pointing from upper left to lower right, pointing (to the right) more downward over time, rather than more right-ward, in the cropped images.

      We appreciate the reviewer’s comment. We apologize for the confusion caused by our description of the tilting direction. For consistency in our later description, we mention the “right” or “left” direction of the radial fibers referencing to the elongation of the radial fiber, which then brings the “rightward tilting” toward the ACW rotation of the chiral pattern. To maintain the word “rightward tilting”, we added the description to ensure accurate communication in our writing. We also rearrange the image in the new Figure 4A and Video 2 for better observation. Revision is made on page 8, lines 262 to 263.

      (5) Why are the cells Figure 4A dominated by radial (and more-central, tilting fibers, while control cells in 4D show robust circumferential transverse arcs? Have these cells been plated for different amounts of time or is a different optical section shown?

      We appreciate the reviewer’s comment. The cells in Figure 4A and Figure 4D are prepared with similar conditions, such as incubation time and optical setting. Actin organization is a dynamic process, and cells can exhibit varied actin arrangements, transitioning between different forms such as circular, radial, chordal, chiral, or linear patterns, as they spread on a circular island (Tee et al., 2015). In Figure 4A, the actin is arranged in a chiral pattern, whereas in Figure 4D, the actin exhibits a radial pattern. These variations reflect the natural dynamics of actin organization within cells during the imaging process.

      (6) All single-color images (such as Fig 5 F-actin) need to be black-on-white, since it is far more difficult to see F-actin morphology with red on black.

      We appreciate the reviewer’s comment. We have changed all F-actin images (single color) into black and white for better image clarity. Revisions are made in the new Figure 5, Figure 6 and Figure 7.

      (7) Figure 5A, especially the F-actin staining, is quite a bit blurrier than other micrographs. These images should be replaced with images of comparable quality to those shown throughout.

      We appreciate the reviewer’s comment. We agree that the F-actin staining in Figure 5 is difficult to observe. To improve image clarity, the F-actin staining images are replaced with more zoomed-in image. Revision is made in the new Figure 5.

      (8) F-actin does not look unchanged by Y27632 treatment, as the authors state in line 306. This may be partially due to image quality and the ambiguities of communicating with the blue-to-red colormap. Similarly, I don't agree that mDia2 depletion did not change F-actin distribution (line 330) as cells in that condition had a prominent peripheral ring of F-actin missing from cells in other conditions.

      We appreciate the reviewer’s comment. We agree with the reviewer’s observation that the F-actin distribution is indeed changed under Y27632 treatment compared to the control in Figure 5A-B. Here, we would like to emphasize that the actin ring persists despite the actin structure being altered under the Y27632 treatment. The actin ring refers to the darker red circle in the distribution heatmap. It presents the condensed actin structure, including radial fibers and transverse arcs. This important structure remains unaffected despite the disruption of myosin II, the key component in radial fiber.

      Furthermore, we agree with the reviewer that mDia2 depletion does change F-actin distribution. Similar to the Y27632 treatment, the actin ring persists despite the actin structure being altered under mDia2 gene silencing. Moreover, compared to other treatments, mDia2 depletion has less significant impact on actin distribution. To address these points more comprehensively, we have made revision in Y27632 treatment and mDia2 sections. The revisions of Y27632 and mDia2 are made on pages 10, lines 324-327 and 352-353, respectively.

      (9) The colormap shown for intensity coding should be reconsidered, as dark red is harder to see than the yellow that is sub-maximal. Verdis is a colormap ranging from cooler and darker blue, through green, to warmer and lighter yellow as the maximum. Other options likely exist as well.

      We appreciate the reviewer’s comment. We carefully considered the reviewer’s concern and explored other color scale choices in the colormap function in Matlab. After evaluating different options, including “Verdis” color scale, we found that “jet” provides a wide range of colors, allowing the effective visual presentation of intensity variation in our data. The use of ‘jet’ allows us to appropriately visualize the actin ring distribution, which represented in red or dark re. While we understand that dark red could be harder to see than the sub-maximal yellow, we believe that “jet” serves our purpose of presenting the intensity information.

      (10) For Figure 6, why doesn't average distribution of NMMIIa look like the example with high at periphery, low inside periphery, moderate throughout lamella, low perinuclear, and high central?

      We appreciate the reviewer’s comment. We understand that the reviewer’s concern about the average distribution of NMMIIa not appearing as the same as the example. The chosen image is the best representation of the NMMIIa disruption from the transverse arcs after the mDia2 silencing. Additionally, it is important to note that the average distribution result is a stacked image which includes other images. As such, the NMMIIA example and the distribution heatmap might not necessarily appear identical.

      (11) In 2015, Tee, Bershadsky and colleagues demonstrated that transverse bundles are dorsal to radial bundles, using correlative light and electron microscopy. While it is important for Kwong and colleagues to show that this is true in their cells, they should reference Tee et al. in the rationale section of text pertaining to Figure 7.

      We appreciate the reviewer’s comment. Tee, et al (Tee et al., 2015) demonstrated the transverse fiber is at the same height as the radial fiber based on the correlative light and electron microscopy. Here, using the position of myosin IIa, a transverse arc component, our results show the dorsal positioning of transverse arcs with connection to the extension of radial fibers (Figure 7C), which is consistent with their findings. It is included in our manuscript, page 12, lines 421 to 424, and page 14 lines 477 to 480.

      Reference

      Applewhite, D.A., Barzik, M., Kojima, S.-i., Svitkina, T.M., Gertler, F.B., and Borisy, G.G. (2007). Ena/Vasp Proteins Have an Anti-Capping Independent Function in Filopodia Formation. Mol. Biol. Cell. 18, 2579-2591. DOI: https://doi.org/10.1091/mbc.e06-11-0990

      Bao, Y., Wu, S., Chu, L.T., Kwong, H.K., Hartanto, H., Huang, Y., Lam, M.L., Lam, R.H., and Chen, T.H. (2020). Early Committed Clockwise Cell Chirality Upregulates Adipogenic Differentiation of Mesenchymal Stem Cells. Adv. Biosyst. 4, 2000161. DOI: https://doi.org/10.1002/adbi.202000161

      Chen, Q.-Y., Xu, L.-Q., Jiao, D.-M., Yao, Q.-H., Wang, Y.-Y., Hu, H.-Z., Wu, Y.-Q., Song, J., Yan, J., and Wu, L.-J. (2011). Silencing of Rac1 Modifies Lung Cancer Cell Migration, Invasion and Actin Cytoskeleton Rearrangements and Enhances Chemosensitivity to Antitumor Drugs. Int. J. Mol. Med. 28, 769-776. DOI: https://doi.org/10.3892/ijmm.2011.775

      Chin, A.S., Worley, K.E., Ray, P., Kaur, G., Fan, J., and Wan, L.Q. (2018). Epithelial Cell Chirality Revealed by Three-Dimensional Spontaneous Rotation. Proc. Natl. Acad. Sci. U.S.A. 115, 12188-12193. DOI: https://doi.org/10.1073/pnas.1805932115

      Dettenhofer, M., Zhou, F., and Leder, P. (2008). Formin 1-Isoform IV Deficient Cells Exhibit Defects in Cell Spreading and Focal Adhesion Formation. PLoS One 3, e2497. DOI:  https://doi.org/10.1371/journal.pone.0002497

      Gao, Y., Dickerson, J.B., Guo, F., Zheng, J., and Zheng, Y. (2004). Rational Design and Characterization of a Rac GTPase-Specific Small Molecule Inhibitor. Proc. Natl. Acad. Sci. U.S.A. 101, 7618-7623. DOI: https://doi.org/10.1073/pnas.0307512101

      Gateva, G., Tojkander, S., Koho, S., Carpen, O., and Lappalainen, P. (2014). Palladin Promotes Assembly of Non-Contractile Dorsal Stress Fibers through Vasp Recruitment. J. Cell Sci. 127, 1887-1898. DOI: https://doi.org/10.1242/jcs.135780

      Haarer, B., and Brown, S.S. (1990). Structure and Function of Profilin.

      Head, J.A., Jiang, D., Li, M., Zorn, L.J., Schaefer, E.M., Parsons, J.T., and Weed, S.A. (2003). Cortactin Tyrosine Phosphorylation Requires Rac1 Activity and Association with the Cortical Actin Cytoskeleton. Mol. Biol. Cell. 14, 3216-3229. DOI: https://doi.org/10.1091/mbc.e02-11-0753

      Hotulainen, P., and Lappalainen, P. (2006). Stress Fibers are Generated by Two Distinct Actin Assembly Mechanisms in Motile Cells. J. Cell Biol. 173, 383-394. DOI: https://doi.org/10.1083/jcb.200511093

      Jalal, S., Shi, S., Acharya, V., Huang, R.Y., Viasnoff, V., Bershadsky, A.D., and Tee, Y.H. (2019). Actin Cytoskeleton Self-Organization in Single Epithelial Cells and Fibroblasts under Isotropic Confinement. J. Cell Sci. 132. DOI: https://doi.org/10.1242/jcs.220780

      Kwong, H.K., Huang, Y., Bao, Y., Lam, M.L., and Chen, T.H. (2019). Remnant Effects of Culture Density on Cell Chirality after Reseeding. J. Cell Sci. 132. DOI: https://doi.org/10.1242/jcs.220780

      Moolenaar, W.H. (1995). Lysophosphatidic Acid, a Multifunctional Phospholipid Messenger. J. Cell Sci. 132. DOI: https://doi.org/10.1242/jcs.220780

      Mukherjee, K., Ishii, K., Pillalamarri, V., Kammin, T., Atkin, J.F., Hickey, S.E., Xi, Q.J., Zepeda, C.J., Gusella, J.F., and Talkowski, M.E. (2016). Actin Capping Protein Capzb Regulates Cell Morphology, Differentiation, and Neural Crest Migration in Craniofacial Morphogenesis. Hum. Mol. Genet. 25, 1255-1270. DOI: https://doi.org/10.1093/hmg/ddw006

      Sahasrabudhe, A., Ghate, K., Mutalik, S., Jacob, A., and Ghose, A. (2016). Formin 2 Regulates the Stabilization of Filopodial Tip Adhesions in Growth Cones and Affects Neuronal Outgrowth and Pathfinding In Vivo. Development 143, 449-460. DOI: https://doi.org/10.1242/dev.130104

      Serrels, B., Serrels, A., Brunton, V.G., Holt, M., McLean, G.W., Gray, C.H., Jones, G.E., and Frame, M.C. (2007). Focal Adhesion Kinase Controls Actin Assembly via a Ferm-Mediated Interaction with the Arp2/3 Complex. Nat. Cell Biol. 9, 1046-1056. DOI: https://doi.org/10.1038/ncb1626

      Shao, X., Li, Q., Mogilner, A., Bershadsky, A.D., and Shivashankar, G. (2015). Mechanical Stimulation Induces Formin-Dependent Assembly of a Perinuclear Actin Rim. Proc. Natl. Acad. Sci. U.S.A. 112, E2595-E2601. DOI: https://doi.org/10.1073/pnas.1504837112

      Tee, Y.H., Goh, W.J., Yong, X., Ong, H.T., Hu, J., Tay, I.Y.Y., Shi, S., Jalal, S., Barnett, S.F., and Kanchanawong, P. (2023). Actin Polymerisation and Crosslinking Drive Left-Right Asymmetry in Single Cell and Cell Collectives. Nat. Commun. 14, 776. DOI: https://doi.org/10.1038/s41467-023-35918-1

      Tee, Y.H., Shemesh, T., Thiagarajan, V., Hariadi, R.F., Anderson, K.L., Page, C., Volkmann, N., Hanein, D., Sivaramakrishnan, S., Kozlov, M.M., and Bershadsky, A.D. (2015). Cellular Chirality Arising from the Self-Organization of the Actin Cytoskeleton. Nat. Cell Biol. 17, 445-457. DOI: https://doi.org/10.1038/ncb3137

      Tojkander, S., Gateva, G., Schevzov, G., Hotulainen, P., Naumanen, P., Martin, C., Gunning, P.W., and Lappalainen, P. (2011). A Molecular Pathway for Myosin II Recruitment to Stress Fibers. Curr. Biol. 21, 539-550. DOI: https://doi.org/10.1016/j.cub.2011.03.007

      Tsuji, T., Ishizaki, T., Okamoto, M., Higashida, C., Kimura, K., Furuyashiki, T., Arakawa, Y., Birge, R.B., Nakamoto, T., Hirai, H., and Narumiya, S. (2002). Rock and mdia1 Antagonize in Rho-Dependent Rac Activation in Swiss 3T3 Fibroblasts. J. Cell Biol. 157, 819-830. DOI: https://doi.org/10.1083/jcb.200112107

      Wan, L.Q., Ronaldson, K., Park, M., Taylor, G., Zhang, Y., Gimble, J.M., and Vunjak-Novakovic, G. (2011). Micropatterned Mammalian Cells Exhibit Phenotype-Specific Left-Right Asymmetry. Proc. Natl. Acad. Sci. U.S.A. 108, 12295-12300. DOI: https://doi.org/10.1073/pnas.1103834108

      Witke, W. (2004). The Role of Profilin Complexes in Cell Motility and Other Cellular Processes. Trends Cell Biol. 14, 461-469. DOI: https://doi.org/10.1016/j.tcb.2004.07.003

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable contribution studies factors that impact molecular exchange between dense and dilute phases of biomolecular condensates through continuum models and coarse-grained simulations. The authors provide solid evidence that interfacial resistance can cause molecules to bounce off the interface and limit mixing. Results like these can inform how experimental results in the field of biological condensates are interpreted.

      We would like to sincerely thank the editors for spending time on our manuscript and for the very positive assessment of our work. We have carefully considered and addressed the reviewers’ comments in the point-by-point response below and have revised our manuscript accordingly.

      Reviewer #1 (Public Review):

      Summary:

      In this paper by Zhang, the authors build a physical framework to probe the mechanisms that underlie the exchange of molecules between coexisting dense and dilute liquid-like phases of condensates. They first propose a continuum model, in the context of a FRAP-like experiment where the fluorescently labeled molecules inside the condensate are bleached at t=0 and the recovery of fluorescence is measured. Through this model, they identify how the key timescales of internal molecular mixing, replenishment from dilute phase, and interface transfer contribute to molecular exchange timescale. Motivated by a recent experiment reported by some of the co-authors previously (Brangwynne et al. in 2019) finding strong interfacial resistance in in-vitro protein droplets of LAF-1, they seek to understand the microscopic features contributing to the interfacial conductance (inversely proportional to the resistance). To check, they perform coarse-grained MD simulations of sticker-spacer self-associative polymers and report how conductance varies significantly even across the few explored sequences. Further, by looking at individual trajectories, they postulate that "bouncing" - i.e., molecules that approach the interface but are not successfully absorbed - is a strong contributor to this mass transfer limitation. Consistent with their predictions, sequences that have more free unbound stickers (i.e., for example through imbalance sequence sticker stoichiometries) have higher conductances and they show a simple linear scaling between the number of unbound stickers and conductance. Finally, they predict a droplet-size-dependent transition in recovery time behavior.

      Strengths:

      (1) This paper is well-written overall and clear to understand.

      (2) By combining coarse-grained simulations, continuum modeling, and comparison to published data, the authors provide a solid picture of how their proposed framework relates to molecular exchange mechanisms that are dominated by interface resistance and LAF-1 droplets.

      (3) The choice of different ways to estimate conductance from simulation and reported data are thoughtful and convincing in their near agreement (although a little discussion of why and when they differ would be merited as well).

      We would like to thank the reviewer for the positive evaluation of our work. Indeed, we are grateful to the reviewer for this thoughtful, detailed, and constructive report, which has helped us strengthen the manuscript.

      Weaknesses:

      (1) Almost the entirety of this paper is motivated by a previously reported FRAP experiment on a particular LAF-1 droplet in vitro. There are a few major concerns I have with how the original data is used, how these results may generalize, and the lack of connection of predictions with any other experiments (published or new).

      a. The mean values of cdense, cdilute, diffusivities, etc. are taken from Taylor et al. to rule in the importance of interfacial mass transfer limits. While this may be true, the values originally inferred (in the 2019 paper that this paper is strongly built off) report extremely large confidence intervals/inferred standard errors. The authors should accordingly report all their inferences with correct standardized errors or confidence intervals, which in turn, allow us to better understand these data.

      Yes, agreed. We have now included the standard errors of the parameters from Taylor et al. (2019), and reported the corresponding standard errors for the timescales and interface conductance using error propagation. We have modified Fig. 1C right panel as well as the text in the figure caption:

      “(Right) Expected recovery times and if the slowest recovery process was either the flux from the dilute phase or diffusion within the droplet, respectively, with and taken from Taylor et al. (2019). While the timescale associated with interface resistance is unknown, the measured recovery time is much longer than and , suggesting the recovery is limited by flux through the interface, with an interface conductance of  (Below Figure 1)”

      b. The generalizability of this model is hard to gauge when all comparisons are made to a single experiment reported in a previous paper.

      i. Conceptually, the model is limited to single-component sticker-spacer polymers undergoing phase separation which is already a very simplified model of condensates - for e.g., LAF1 droplets in the cell have no perceptible interfacial mass limitations, also reported in Taylor et al. 2019 - so how these mechanisms relate to living systems as opposed to specific biochemistry experiments. So the authors need to discuss the implications and limitations of their model in the living context where there are multiple species, finite-size effects, and active processes at play.

      We thank the reviewer for the critical comment. To address this point, we have included a paragraph in the Discussion regarding in vivo situations:

      “In this work, we focused on the exchange dynamics of in vitro single-component condensates. How is the picture modified for condensates inside cells? It has been shown that Ddx4-YFP droplets in the cell nucleus exhibit negligible interface resistance Taylor et al. (2019), which raises the question whether interface resistance is relevant to natural condensates in vivo. Future quantitative FRAP and single-molecule tracking experiments on different types of droplets in the cell will address this question. One complication is that condensates in cells are almost always multi-component, which can increase the complexity of the exchange dynamics. Interestingly, formation of multiple layers or the presence of excess molecules of one species coating the droplet is likely to increase interface resistance. A notable example is the Pickering effect, in which adsorbed particles partially cover the interface, thereby reducing the accessible area and the overall condensate surface tension, slowing down the exchange dynamics Folkmann et al. (2021). The development of theory and modeling for the exchange dynamics of multi-component condensates is currently underway. (Lines 323-334)”

      ii. Second, can the authors connect their model to make predictions of the impact of perturbations to LAF-1 on exchange timescales? For example, are mutants (which change the number or positioning of "stickers") expected to show particular trends in conductances or FRAP timescales? Since LAF-1 is a relatively well-studied protein in vitro, can the authors further contrast their expectations with already published datasets that explore these perturbations, even if they don't generate new data?

      Our model is intended to address interface exchange dynamics at the conceptual level. The underlying mechanism for the large interface resistance of LAF-1 droplets could be more complicated than explored in our work. To study the impact of perturbations to LAF-1 on exchange timescales likely requires substantially more sophisticated molecular dynamics simulations. We undertook an extensive search for FRAP experiments on LAF-1 droplets where the whole droplet is photobleached, but were not able to find another dataset. We would be grateful if the reviewer is aware of such data and can point us to it.

      iii. A key prediction of the interface limitation model is the size-dependent crossover in FRAP dynamics. Can the authors reanalyze published data on LAF-1 (albeit of different-size droplets) to check their predictions? At the least, is the crossover radius within experimentally testable limits?

      Based on our prediction, the crossover radius for LAF-1 droplet is around 70 𝜇m. We have added a sentence in the text to point this out:

      “We also predict the crossover for LAF-1 droplets to be around 𝑅 = 71 𝜇m, which in principle can be tested experimentally. (Lines 285-286)”

      Unfortunately, most of FRAP experiments in Taylor at al. (2019) are partial FRAP experiments, in which only part of the dense phase is photobleached. The recovery time for such experiments reflects primarily the internal mixing speed of the dense phase rather than the exchange dynamics at the interface or transport from the dilute phase.

      c. The authors nicely relate the exchange timescale to various model parameters. Is LAF-1 the only protein for which the various dilute/dense concentrations/diffusivities are known? Given the large number of FRAP and other related studies, can the authors report on a few other model condensate protein systems? This will help broaden the reach of this model in the context of other previously reported data. If such data are lacking, a discussion of this would be important.

      Yes, indeed, we have found numerous publications with FRAP experiments performed on whole droplets of various proteins. However, none of these have provided a complete set of parameters to allow a quantitative analysis. Part of the reason is because it is nontrivial to have an accurate measurement of the partition coefficient (cden/cdil). We have added a sentence in the Discussion to promote future quantitative experiment and analysis of condensate exchange dynamics:

      “We hope that our study will motivate further experimental investigations into the anomalous exchange dynamics of LAF-1 droplets and potentially other condensates, and the mechanisms underlying interface resistance. (Lines 320-322)”

      To broaden the audience for this work in the hope of stimulating such studies, we have also modified the title and abstract so that it will be more visible to the FRAP community:

      “The exchange dynamics of biomolecular condensates (Line 1)”

      “A hallmark of biomolecular condensates formed via liquid-liquid phase separation is that they dynamically exchange material with their surroundings, and this process can be crucial to condensate function. Intuitively, the rate of exchange can be limited by the flux from the dilute phase or by the mixing speed in the dense phase. Surprisingly, a recent experiment suggests that exchange can also be limited by the dynamics at the droplet interface, implying the existence of an “interface resistance”. Here, we first derive an analytical expression for the timescale of condensate material exchange, which clearly conveys the physical factors controlling exchange dynamics. We then utilize sticker-spacer polymer models to show that interface resistance can arise when incident molecules transiently touch the interface without entering the dense phase, i.e., the molecules “bounce” from the interface. Our work provides insight into condensate exchange dynamics, with implications for both natural and synthetic systems. (Lines 16-26)”

      (2) The reported sticker-spacer simulations, while interesting, represent a very small portion of the parameter space. Can the authors - through a combination of simulation, analyses, or physical reasoning, comment on how the features of their underlying microscopic model (sequence length, implicit linker length, relative stoichiometry of A/B for a given length, overall concentration, sequence pattern properties like correlation length) connect to conductance? This will provide more compelling evidence relating their studies beyond the cursory examination of handpicked sequences. A more verbose description of some of the methods would be appreciated as well, including specifically how to (a) calculate the bond lifetime of isolated A-B pair, and (b) how equilibration/convergence of MD simulations is established.

      In our simulation, the interface conductance is essentially controlled by the fraction of unbound stickers, the encounter rate of a pair of unbound stickers, the dilute- and dense-phase concentrations, and the width of the interface. As a result, weaker binding strength and/or deviation of A:B stoichiometry from 1:1 result in a higher interface conductance. A6B6 polymers with long blocks of stickers of the same type (compared to (A2B2)3 and (A3B3)2) have a lower dilute-phase concentration and thinner interface width, so lower conductance. Sequence length and implicit linker length can have more complex effects, which are beyond the scope of the current study. We have now provided an explicit expression for 𝜅 in Equation (14) and added a discussion sentence in the text:

      “More generally, we find that the interface conductance of the sticker-spacer polymers is controlled by the encounter rate of a pair of unbound stickers and the availability of these stickers, which in turn depends on the sticker-sticker binding strength, the dilute- and dense-phase polymer concentrations, and the width of the interface:

      where 𝓃 is the number of monomers in a polymer,  is the global stoichiometry (i.e., ), and are the fractions of unbound A/B monomers in the dilute and dense phases. (Lines 208-214)”

      We have also added a few sentences in Appendix 2 to describe how we calculate the bond lifetime of an isolated A-B pair and how equilibration in simulations is established.

      “Briefly, the bond lifetime of an isolated pair is obtained by simulating a bound pair of A-B stickers in a box and recording the time when they first separate by the cutoff distance of the attractive interaction nm. The mean bond lifetime 𝜏 is found by averaging results of 1000 replicates with different random seeds. (Lines 642-645)”

      “To test if the system has reached equilibrium, we compare the dense- and dilute-phase concentrations derived from the first and second halves of the recorded data. The agreement indicates that the system has reached equilibrium. (Lines 586-589)”

      (3) A lot of the main text repeats previously published models (continuum ones in Taylor et al. 2019 and Hubsatch et al., 2021, amongst others) and the idea of interface resistance being limiting was already explored quantitatively in Taylor 2019 (including approximate estimates of mass transfer limitations) - this is fine in context. While the authors do a good job of referring to past work in context, the main results of this paper, in my reading, are:

      - a simplified physical form relating conductance timescales.

      - sticker-spacer simulations probing microscopic origins.

      - analysis of size-dependent FRAP scaling.

      I am stating this not as a major weakness, but, rather - I would recommend summarizing and categorizing the sections to make the distinctions between previously reported work and current advances sufficiently clear.

      We thank the reviewer for a clear summary of the contributions of our work. We have highlighted our main contributions in multiple places:

      “Here, we first derive an analytical expression for the timescale of condensate material exchange, which clearly conveys the physical factors controlling exchange dynamics. We then utilize sticker-spacer polymer models to show that interface resistance can arise when incident molecules transiently touch the interface without entering the dense phase, i.e., the molecules “bounce” from the interface. (Lines 21-25)”

      “In the following, we first derive an analytical expression for the timescale of condensate material exchange, which conveys a clear physical picture of what controls this timescale. We then utilize a “sticker-spacer” polymer model to investigate the mechanism of interface resistance. We find that a large interface resistance can occur when molecules bounce off the interface rather than being directly absorbed. We finally discuss characteristic features of the FRAP recovery pattern of droplets when the exchange dynamics is limited by different factors. (Lines 65-70)”

      “Specifically, we first derived an analytical expression for the exchange rate, which conveys the clear physical picture that this rate can be limited by the flux of molecules from the dilute phase, by the speed of mixing inside the dense phase, or by the dynamics of molecules at the droplet interface. Motivated by recent FRAP measurements Taylor et al. (2019) that the exchange rate of LAF-1 droplets can be limited by interface resistance, which contradicts predictions of conventional mean-field theory, we investigated possible physical mechanisms underlying interface resistance using a “sticker-spacer” model. Specifically, we demonstrated via simulations a notable example in which incident molecules have formed all possible internal bonds, and thus bounce from the interface, giving rise to a large interface resistance. Finally, we discussed the signatures in FRAP recovery patterns of the presence of a large interface resistance. (Lines 291-300)”

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors have obtained an analytical expression that provides intuition about regimes of interfacial resistance that depend on droplet size. Additionally, through simulations, the authors provide microscopic insight into the arrangement of sticky and non-sticky functional groups at the interface. The authors introduce bouncing dynamics for rationalizing quantity recovery timescales.

      I found several sections that felt incomplete or needed revision and additional data to support the central claim and make the paper self-contained and coherent.

      We thank the reviewer for spending time on our manuscript and for the helpful critical comments.

      First, the analytical theory operates with diffusion coefficients for dilute and dense phases. For the dilute phase, this is fine. For the dense phase, I have doubts that dynamics can be described as diffusive. Most likely, dynamics is highly subdiffusive due to crowded, entangled, and viscoelastic environments of densely packed interactive biomolecules. Some explanation and justification are in order here.

      The reviewer is correct in noting that molecules within a condensate can move subdiffusively due to the viscoelastic nature of the condensate. However, subdiffusion only occurs at short time and small length scales, the motion of molecules becomes diffusive at longer time and larger length scales. The crossover time here is the terminal relaxation time measured to be on the order of milliseconds to seconds for typical condensates (see Alshareedah, Ibraheem, et al. "Determinants of viscoelasticity and flow activation energy in biomolecular condensates" Science Advances 10.7, 2024). We previously have also found that, for sticker-spacer polymers, this relaxation time is determined by the time it takes for a sticker to switch to a new partner (see Ronceray et al. (2022) in References), which is therefore largely determined by the bond lifetime of a sticker pair. The crossover length scale is expected to be comparable to the size of a molecule based on the theory of polymer disentanglement. Importantly, in order for the bleached droplet to recover its fluorescence, the bleached molecules must travel for a much longer time and a much larger length than the crossover time and length. It is therefore expected that the molecules move diffusively on the relevant timescale of a FRAP experiment, albeit with a diffusion coefficient that reflects crowding and entanglement on short time and length scales.

      The second major issue is that I did not find a clean comparison of simulations with the derived analytical expression. Simulations test various microscopic properties on the value of k, which is important. But how do we know that it is the same quantity that appears in the expressions? Also, how can we be sure that analytical expressions can guide simulations and experiments as claimed? The authors should provide sound evidence of the predictive aspect of their derived expressions.

      We thank the reviewer for raising this critical issue. We agree with the reviewer that we did not perform an explicit simulation to validate the developed theory, which leaves a gap between our theory and simulations. The main reason is because simulation of an in silico “FRAP experiment” on a 3D droplet is very computationally costly. Nevertheless, following the reviewer’s suggestion, we have now performed such a simulation in which we “bleached” a small A6B6 droplet and measured its recovery time. The good agreement between simulation and theory helps validate our overall combined computational and analytical approach. We have incorporated the new simulation and results into the manuscript. Two new sections including new figures (Figure 4 and Appendix 2 Figure 4) are added: “Direct simulation of droplet FRAP” in the main text (lines 232-261) and “Details of simulation and theory of FRAP recovery of an A6B6 droplet” in Appendix 2 (lines 665-715).

      Are the plots in Figure 4 coming from experiment, theory, and simulation? I could not find any information either in the text or in the caption.

      Figure 4 (now Figure 5) is from theory which uses parameters of the A6B6 system in simulation. We have added the following sentences to clarify:

      “We compare the measured FRAP recovery time for the small droplet (green circle) to theoretical predictions from Equation (6) (gray) and Equations (1) - (4) (black) in Figure 5A. (Lines 255-257)”

      “Figure 5. FRAP recovery patterns for large versus small droplets can be notably different for condensates with a sufficiently large interface resistance. (A) Expected relaxation time as a function of droplet radius for in silico “FRAP experiments” on the A6B6 system. The interface resistance dominates recovery times for smaller droplets, whereas dense-phase diffusion dominates recovery times for larger droplets. Green circle: FRAP recovery time obtained from direct simulation of an A6B6 droplet of radius 37 nm. Black curve: the recovery time as a function of droplet radius from a single exponential fit of the exact solution of the recovery curve from Equations (1) - (4). Gray curve: the recovery time predicted by Equation (6). Yellow, blue, and red curves: the recovery time when dense-phase, dilute-phase, and interface flux limit the exchange dynamics, i.e., the first, second, and last term in Equation (6), respectively. Parameters matched to the simulated A6B6 system in the slab geometry: (B) Time courses of fluorescence profiles for A6B6 droplets of radius  (top) and  (bottom); red is fully bleached, green is fully recovered. These concentration profiles are the numerical solutions of Equations (1) - (3) using the parameters in (A). (Below Figure 5)”

    1. Author response:

      We thank the reviewers for their insightful comments on our model and manuscript. In this provisional response, we would like to comment on some of the issues raised and how we plan to address them.

      First, the reviewers correctly pointed out that only a small part of the full model was openly available. We have now rectified this and the full model is available at: https://dataverse.harvard.edu/dataverse/sscx.

      Next, we would like to comment on the perceived lack of clarity of certain descriptions in the manuscript. We note that individual techniques and parts of the model have been developed, justified, and validated in previous publications. This left us with the question of how much of the contents of those papers we should re-describe. Too much, and the manuscript becomes overly long; too little, and the reader cannot gain a sufficient understanding of the model building process. The reviewers' comments made it clear that some aspects of the model should be described in more detail and we plan to address this in a revision. Crucially, one missing item raised by all reviewers was a comparison of local connection probabilities to the literature. This will be provided in the revision. Additionally, the reviewers questioned our decision to use a connectivity algorithm that is not based on direct parameterization of target connection probabilities. While this is a limitation of the algorithm we employed, it also has unique strengths, providing non-random aspects of connectivity that have been proven to be impossible to model with algorithms that enforce given connection probabilities or degree distributions. We plan to explain this better in a revision.

      We will also comment on the challenges associated with the interpretation of experimentally measured connection probabilities and employing them for the parameterization of a biophysically detailed model spanning millimeters.

      The reviewers also suggested several aspects of the model that could be improved. Whilst we see merit with all of them, we would like to briefly comment on model completeness in general. First, this model - and any model - can probably never be considered complete. Instead, the model has to be continuously refined, which one reviewer phrased as the "live nature" of the model. However, to demonstrate the model's utility and justify the expense of modeling, we also have to use the model in projects that explore specific scientific questions. To undertake and complete such a project, one must select and "freeze" a given version of the model-- otherwise the project will never conclude. Further, we believe that it is advantageous if several projects use the same version of the model. In that case, a reader who is already familiar with the model from one paper may find it easier to understand other papers using the same model. The goal of this manuscript is to describe the version of the model that we used in several ongoing and concluded follow-up projects, including its limitations and opportunities for refinement. As such, we do not plan to add further improvements to the model for this reviewed pre-print. We will, however, continue to refine the model outside of the scope of this publication. Since we believe the development and bottom up models are best done in a community driven manner, we encourage interested parties to participate.

      We invite anyone with ideas of how the model could be refined to contact us to discuss how we could integrate these changes into the model together using our tools.

    1. Author response:

      eLife assessment

      This important study reports numerous attempts to replicate reports on transgenerational inheritance of a learned behavior, pathogen avoidance, in C. elegans. While the authors observe parental effects that are limited to a single generation (also called intergenerational inheritance), the authors failed to find any evidence for transmission over multiple generations, or transgenerational inheritance. The experiments presented are meticulously described, making for compelling evidence that in the authors' hands transgenerational inheritance cannot be observed, although there remains the possibility that subtle differences in culture conditions or lab environment explain the failure to reproduce previous observations. Given the prominence of the original reports of transgenerational inheritance, the present study is of broad interest to anyone studying genetics, epigenetics, or learned behavior.

      Thank you for your considered reviews and advice on how to improve our manuscript. We appreciate that the editors and reviewers felt that our manuscript addressed an important issue and acknowledged the difficulty of publishing negative results. We will revise the manuscript and consider all the concerns raised by the editor and referees.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors report an inability to reproduce a transgenerational memory of avoidance of the pathogen PA14 in C. elegans. Instead, the authors demonstrate intergenerational inheritance for a single F1 generation, in embryos of mothers exposed to OP50 and PA14, where embryos isolated from these mothers by bleaching are capable of remembering to avoid PA14 in a manner that is dependent on systemic RNAi proteins sid-1 and sid-2. This could reflect systemic sRNAs generated by neuronal daf-7 signaling that are transmitted to F1 embryos. The authors note that transgenerational memory of PA14 was reported by the Murphy group at Princeton, but that environmental or strain variation (worms or bacteria) might explain the single generation of inheritance observed at Harvard. The Hunter group tried different bacterial growth conditions and different worm growth temperatures for independent PA14 strains, which they showed to be strongly pathogenic. However, the authors could not reproduce a transgenerational effect at Harvard. This important data will allow members of the scientific community to focus on the robust and reproducible inheritance of PA14 avoidance transmitted to F1 embryos of mothers exposed to PA14, which the authors demonstrate depends on small RNAs in a manner that is downstream of or in parallel to daf-7. This paper honestly and importantly alters expectations and questions the model that avoidance of PA14 is mediated by a bacterial ncRNA whose siRNAs target a C. elegans gene. Instead, endogenous C. elegans sRNAs that affect pathogen response may be the culprit that explains sRNA-mediated avoidance.

      Overall, this is an important paper that demonstrates that one model for transgenerational inheritance in C. elegans is not reproducible. This is important because it is not clear how many of the reported models of transgenerational inheritance reported in C. elegans are reproducible. The authors do demonstrate a memory for F1 embryos that could be a maternal effect, and the authors confirm that this is mediated by a systemic small RNA response. There are several points in the manuscript where a more positive tone might be helpful.

      We would like to correct the statement made in the second to last sentence. The demonstration of an F1 response to PA14 was first reported by Moore et al., (2019) and then by Pereira et al., (2020) using a different behavioral assay. We merely confirmed these results in our hands, and confirmed the observation, first reported by Kaletsky et al., (2020), that sid-1 and sid-2 are required for this F1 response; although we did find that sid-1 and sid-2 are not required for the PA14-induced increase in daf-7p::gfp expression in ASI neurons in the F1 progeny of trained adults, which had not been addressed in the published work.

      Yes, the intergenerational F1 response could be a maternal effect, but the in utero F1 embryos and their precursor germ cells were directly exposed to PA14 metabolites and toxins (non-maternal effect) as well as any parental response, whether mediated by small RNAs, prions,  hormones, or other unknown information carriers. While the F1 aversion response does require sid-1 and sid-2, we would not presume that the substrate is therefore an RNA molecule, particularly because the systemic RNAi response supported by sid-1 and sid-2 is via long double-stranded RNA. To date, no evidence suggests that either protein transports small RNAs, particularly single-stranded RNAs. 

      Strengths:

      The authors note that the high copy number daf-7::GFP transgene used by the Murphy group displayed variable expression and evidence for somatic silencing or transgene breakdown in the Hunter lab, as confirmed by the Murphy group. The authors nicely use single copy daf-7::GFP to show that neuronal daf-7::GFP is elevated in F1 but not F2 progeny with regards to the memory of PA14 avoidance, speaking to an intergenerational phenotype.

      The authors nicely confirm that sid-1 and sid-2 are generally required for intergenerational avoidance of F1 embryos of moms exposed to PA14. However, these small RNA proteins did not affect daf-7::GFP elevation in the F1 progeny. This result is unexpected given previous reports that single copy daf-7::GFP is not elevated in F1 progeny of sid mutants. Because the Murphy group reported that daf-7 mutation abolishes avoidance for F1 progeny, this means that the sid genes function downstream of daf-7 or in parallel, rather than upstream as previously suggested.

      The authors studied antisense small RNAs that change in Murphy data sets, identifying 116 mRNAs that might be regulated by sRNAs in response to PA14. Importantly, the authors show that the maco-1 gene, putatively targeted by piRNAs according to the Kaletsky 2020 paper, displays few siRNAs that change in response to PA14. The authors conclude that the P11 ncRNA of PA14, which was proposed to promote interkingdom RNA communication by the Murphy group, is unlikely to affect maco-1 expression by generating sRNAs that target maco-1 in C. elegans. The authors define 8 genes based on their analysis of sRNAs and mRNAs that might promote resistance to PA14, but they do not further characterize these genes' role in pathogen avoidance. The Murphy group might wish to consider following up on these genes and their possible relationship with P11.

      Weaknesses:

      This very thorough and interesting manuscript is at times pugnacious.

      We reiterate that we never claimed that Moore et al., (2019) did not obtain their reported results. We simply stated that we could not replicate their results using the published methods and then failed in our search to identify variable(s) that might account for our results. We will do better when revising the manuscript to make clear, unmuddied statements of facts and state that future investigations may provide independent evidence that supports the original claims and explains our divergent results.

      Please explain more clearly what is High Growth media for E. coli in the text and methods, conveying why it was used by the Murphy lab, and if Normal Growth or High Growth is better for intergenerational heritability assays.

      We used the standard recipes as described in Moore et al., (2021), and will include the recipes and some of the relevant commentary from the paragraphs below to the methods and text as appropriate. 

      Normal Growth (NG) media minimally supports OP50 growth, resulting in a thin lawn that minimally obscures viewing larvae and embryos. High Growth (HG) media contains 8X more peptone, which supports much higher OP50 growth, resulting in a thick bacterial lawn that supports larger worm populations. The thicker bacterial lawn can also compromise agar integrity, and the higher worm density encourages worm burrowing behavior, thus the HG plates also have 75% more agar to inhibit worm burrowing. 

      Our results (Figure 4) show that worms grown on OP50 seeded NG or HG plates show different choice responses (PA14 vs OP50). As for experimental “advice”, we would caution our colleagues to not assume that OP50 is a neutral food and to be aware that how you grow and store OP50 (or any bacterial culture that is to be used as food for worms) may have a significant effect on the phenotype you are studying. 

      Reviewer #2 (Public Review):

      This paper examines the reproducibility of results reported by the Murphy lab regarding transgenerational inheritance of a learned avoidance behavior in C. elegans. It has been well established by multiple labs that worms can learn to avoid the pathogen pseudomonas aeruginosa (PA14) after a single exposure. The Murphy lab has reported that learned avoidance is transmittable to 4 generations and dependent on a small RNA expressed by PA14 that elicits the transgenerational silencing of a gene in C. elegans. The Hunter lab now reports that although they can reproduce inheritance of the learned behavior by the first generation (F1), they cannot reproduce inheritance in subsequent generations.

      This is an important study that will be useful for the community. Although they fail to identify a "smoking gun", the study examines several possible sources for the discrepancy, and their findings will be useful to others interested in using these assays. The preference assay appears to work in their hands in as much as they are able to detect the learned behavior in the P0 and F1 generations, suggesting that the failure to reproduce the transgenerational effect is not due to trivial mistakes in the protocol. An obvious reason, however, to account for the differing results is that the culture conditions used by the authors are not permissive for the expression of the small RNA by PA14 that the MUrphy lab identified as required for transgenerational inheritance. It would seem prudent for the authors to determine whether this small RNA is present in their cultures, or at least acknowledge this possibility.

      We note that Kaletsky et al., (2020) (Figure 3L) showed that PA14 ΔP11 bacteria failed to induce an F1 avoidance response. Thus, the fact that we observed F1 avoidance implies that our culture conditions successfully induced P11 expression. We believe that this addresses the concern raised here. We thank the reviewer for raising this issue and we will add a statement to this effect in the revised manuscript.

      The authors should also note that their protocol was significantly different from the Murphy protocol (see comments below) and therefore it remains possible that protocol differences cumulatively account for the different results.

      We disagree. Our adjustments to the core protocol were minor and, where possible, were explicitly tested in side-by-side experiments. To discover the source(s) of discrepancy between our results and the published results we subsequently introduced variations to this core protocol to exclude likely variables (worm and bacteria growth temperatures, assay conditions, worm handling methods, bacterial culture and storage conditions, and some minor developmental timing issues). To substantiate these assertions, we will, upon revision, add the precise protocol we followed for the aversion assay to the supplemental documents, provide some additional experimental results supporting these claims, and further clarify which presented experiments included protocol variations (e.g. sodium azide or cold immobilization). It remains possible that we misunderstood the published protocol, but we were highly motivated to replicate the results and read every published version with extreme care.

      Reviewer #3 (Public Review):

      Summary:

      It has been previously reported in many high-profile papers, that C. elegans can learn to avoid pathogens. Moreover, this learned pathogen avoidance can be passed on to future generations - up to the F5 generation in some reports. In this paper, Gainey et al. set out to replicate these findings. They successfully replicated pathogen avoidance in the exposed animals, as well as a strong increase in daf-7 expression in ASI neurons in F1 animals, as determined by a daf-7::GFP reporter construct. However, they failed to see strong evidence for pathogen avoidance or daf-7 overexpression in the F2 generation. The failure of replication is the major focus of this work.

      Given their failure to replicate these findings, the authors embark on a thorough test of various experimental confounders that may have impacted their results. They also re-analyze the small RNA sequencing and mRNA sequencing data from one of the previously published papers and draw some new conclusions, extending this analysis.

      Strengths:

      (1) The authors provide a thorough description of their methods, and a marked-up version of a published protocol that describes how they adapted the protocol to their lab conditions. It should be easy to replicate the experiments.

      (2) The authors test the source of bacteria, growth temperature (of both C. elegans and bacteria), and light/dark husbandry conditions. They also supply all their raw data, so that the sample size for each testing plate can be easily seen (in the supplementary data). None of these variations appears to have a measurable effect on pathogen avoidance in the F2 generation, with all but one of the experiments failing to exhibit learned pathogen avoidance.

      (3) The small RNA seq and mRNA seq analysis is well performed and extends the results shown in the original paper. The original paper did not give many details of the small RNA analysis, which was an oversight. Although not a major focus of this paper, it is a worthwhile extension of the previous work.

      (4) It is rare that negative results such as these are accessible. Although the authors were unable to determine the reason that their results differ from those previously published, it is important to document these attempts in detail, as has been done here. Behavioral assays are notoriously difficult to perform and public discourse around these attempts may give clarity to the difficulties faced by a controversial field.

      Thank you for your support. Choosing to pursue publication of these negative results was not an easy decision, and we thank members of the community for their support and encouragement.

      Weaknesses:

      (1) Although the "standard" conditions have been tested over multiple biological replicates, many of the potential confounders that may have altered the results have been tested only once or twice. For example, changing the incubation temperature to 25{degree sign}C was tested in only two biological replicates (Exp 5.1 and 5.2) - and one of these experiments actually resulted in apparent pathogen avoidance inheritance in the F2 generation (but not in the F1). An alternative pathogen source was tested in only one biological replicate (Exp 3). Given the variability observed in the F2 generation, increasing biological replicates would have added to the strengths of the report.

      We agree that our study was not exhaustive in our exploration of variables that might be interfering with our ability to detect F2 avoidance. We also note that some of these variables also failed (with many more independent experiments) to induce elevated daf-7p::gfp expression in ASI neurons in F2 progeny. Our goal was not to show that variation in some growth or assay condition would generate reproducible negative results, the exploration was designed to tweak conditions to enable detection of a robust F2 response. Given the strength of the data presented in Moore et al., (2019) we expected that adjustment of the problematic variable would produce positive results apparent in a single replicate, which could then be followed up. If we had succeeded, then we would have documented the conditions that enabled robust F2 inheritance and would have explored molecular mechanisms that support this important but mysterious process.

      (2) A key difference between the methods used here and those published previously, is an increase in the age of the animals used for training - from mostly L4 to mostly young adults. I was unable to find a clear example of an experiment when these two conditions were compared, although the authors state that it made no difference to their results.

      We can state firmly that the apparent time delay did not affect P0 learned avoidance or, as documented in Table S1, daf-7p::gfp expression in ASI neurons. In our experience, training mostly L4’s on PA14 frequently failed to produce sufficient F1 embryos for both F1 avoidance assays or daf-7p::gfp measurements in ASI neurons and collection of F2 progeny. Indeed, in early attempts to detect heritable PA14 aversion, trained P0 and F1 progeny were not assayed in order to obtain sufficient F2’s for a choice assay. These animals failed to display aversion, but without evidence of successful P0 training or an F1 intergenerational response this was deemed a non-fruitful trouble-shooting approach. We will add to our supplemental figures P0 choice results from experiments using younger trained animals that failed to produce sufficient F1’s to continue the inheritance experiments. 

      The different timing between the two protocols may reflect the age of the recovered bleached P0 embryos. It is reasonable to assume that bleaching day 1 adults vs day 2 adults from the P-1 population could shift the average age of recovered P0 embryos by several hours. The Murphy protocol only states that P0 embryos were obtained by bleaching healthy adults. Regardless, if the hypothesis entertained here is true, that a several hour difference in larval/adult age during 24 hours of training affects F2 inheritance of learned aversion but does not affect P0 learned avoidance, then we would argue that this paradigm for heritable learned avoidance, as described in Moore et al, (2019, 2021), is not sufficiently robust for mechanistic investigations. 

      (3) The original paper reports a transgenerational avoidance effect up to the F5 generation. Although in this work the authors failed to see avoidance in the F2 generation, it would have been prudent to extend their tests for more generations in at least a couple of their experiments to ensure that the F2 generation was not an aberration (although this reviewer acknowledges that this seems unlikely to be the case).

      Citations

      Moore, R.S., Kaletsky, R., and Murphy, C.T. (2019). Piwi/PRG-1 Argonaute and TGF-beta Mediate Transgenerational Learned Pathogenic Avoidance. Cell 177, 1827-1841 e1812.

      Pereira, A.G., Gracida, X., Kagias, K., and Zhang, Y. (2020). C. elegans aversive olfactory learning generates diverse intergenerational effects. J Neurogenet 34, 378-388.

      Kaletsky, R., Moore, R.S., Vrla, G.D., Parsons, L.R., Gitai, Z., and Murphy, C.T. (2020). C. elegans interprets bacterial non-coding RNAs to learn pathogenic avoidance. Nature 586, 445-451.

      Moore, R.S., Kaletsky, R., and Murphy, C.T. (2021). Protocol for transgenerational learned pathogen avoidance behavior assays in Caenorhabditis elegans. STAR Protoc 2, 100384.

    1. Author response:

      We appreciate the time and effort that you and the reviewers have dedicated to providing valuable feedback on our manuscript. We are grateful to the reviewers for their insightful comments.

      Reviewer #1:<br /> We thank the reviewer for the positive comments made on our manuscript.

      Reviewer #2:<br /> We thank the reviewer for these positive remarks.

      Concerning the main weakness highlighted by the reviewer:

      We presented results in our submitted work both without noise and with a signal-to-noise ratio (SNR) equal to 50. Figure 5 shows exemplar posterior distributions obtained in a noise-free scenario, and Table 1 reports the number of degeneracies for each model on 10000 noise-free simulations. These results highlight that the presence of degeneracies is inherent to the model definition. Figures 3, 6 and 7 present results considering an SNR of 50. Results with lower SNR have indeed not been included into this work. We agree that adding a figure showing the impact of noise on the posterior distributions will be a good addition to this work. We will include an additional figure in the second version, as interestingly suggested.

    1. Author response:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 3B was not cited in the manuscript.

      We have now included the citation for Figure 3B in the main text: “….whereas NSP13-R567A (lost ATP consumption) and NSP13-K345A/K347A (obstructed the nucleic acid binding channel) failed to inhibit YAP activity (Figure 3B).” (Please see the revised manuscript) 

      Reviewer #2 (Recommendations For The Authors):

      (2) In Figure 1, ciliated cells are marked as a separate cluster from "epithelial cells". Since ciliated cells are epithelial cells, I suggest changing the nomenclature of the clusters.

      We have updated the label from “Ciliated” to “Ciliated Epithelial” in Figure 1A, as suggested. (Please see the revised manuscript)

      (3) Outlines of planned revisions: 1) Reanalyze snRNA-seq and bulk RNA-seq data from Figure 1 to investigate YAP target genes related to innate immune response; 2) Employ ChIP-seq to determine whether NSP13 WT or mutants (K131, K345/K347, and R567) prevent YAP/TEAD complex from binding to DNA by occupying the TEAD DNA binding site, providing insights into the mechanism; 3) Validate NSP13 interacting proteins using Immunoprecipitation-Western Blot (IP-WB) assays based on mass spectrum results; 4) Perform bulk RNA sequencing in cells with or without NSP13 expression to assess endogenous YAP target genes expression.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Overall the authors provide a very limited data set and in fact only a proof of concept that their sensor can be applied in vivo. This is not really a research paper, but a technical note. With respect to their observation of clustered activity, they now provide an overview image, next to zoomed details. However, from these images one cannot conclude 'by eye' any clustering event. This aligns with the very low r values. All neurons in the field show variable activity and a clustering is not really evident from these examples. Even within a cluster, there is variability. The authors now confirm that expression levels are indeed variable but are independent from the ratio measurements. Further, they controlled for specificity by including DAPT treatments, but opposite to their own in vitro data (in primary neurons) the ratios increased. The authors argue that both distance and orientation can either decrease or increase ratios and that the use of this biosensor should be explored model-by-model. This doesn't really confer high confidence and may hinder other groups in using this sensor reliably.

      Secondly, there is still no physiological relevance for this observation. The experiments are performed in wild-type mice, but it would be more relevant to compare this with a fadPSEN1 KI or a PSEN1cKO model to investigate the contribution of a gain of toxic function or LOF to the claimed cell non-autonomous activations. The authors acknowledge this shortcoming but argue that this is for a follow-up study.

      For instance, they only monitor activity in cell bodies, and miss all info on g-sec activity in neurites and synapses: what is the relevance of the cell body associated g-sec and can it be used as a proxy for neuronal g-sec activity? If cells 'communicate' g-sec activities, I would expect to see hot spots of activity at synapses between neurons.

      Without some more validation and physiologically relevant studies, it remains a single observation and rather a technical note paper, instead of a true research paper.

      The effect size was small, as stated in the original and revised manuscripts and the point-by-point responses to the 1st round review. Such subtle effects will likely be challenging to detect by eye. However, our unbiased quantification allowed us to detect a statistically significant linear correlation between the 720/670 ratio in each neuron and the average ratio in neighboring neurons, which we have verified using many different approaches (Figure 3, Figure 3—figure supplement 2, and Figure 4), and the correlation was canceled by the administration of g-secretase inhibitor (Figure 5). Such objective analysis made us more confident to conclude that g-secretase affects g-secretase in neighboring neurons.

      We would also like to make clear the design of the C99 720-670 biosensor. Both C99, the sensing domain that is cleaved by g-secretase, and the anchoring domain fused to miRFP670 are integrated into the membrane (Figure 1A). Therefore, how these two domains with four transmembrane regions are embedded in the membrane should affect the orientation between the donor, miRFP670, and the acceptor, miRFP720. As noted in our point-by-point responses to the initial review, we have previously validated that pharmacological inhibition of g-secretase significantly increases the FRET ratio in various cell lines, including CHO, MEF, BV2 cells, and mouse cortical primary neurons (Maesako et al., 2020; Houser et al., 2020, and unpublished observations). On the other hand, FRET reduction by g-secretase inhibition was found in mouse primary neurons derived from the cerebellum (unpublished observations) as well as the somatosensory cortex neurons in vivo (this study). While we could not use the exact same imaging set-up between cortical primary neurons in vitro and those in vivo due to different expression levels of the biosensor, we could do it for in vitro cortical primary neurons vs. in vitro cerebellum neurons. We found by the direct comparison that 720/670 ratios are significantly higher in the cerebellum than the cortex neurons even in the presence of 1 mM DAPT (Author response image 1), a concentration that nearly completely inhibits g-secretase activity. This suggests a different integration and stabilization pattern of the sensing and anchoring domains in the C99 720-670 biosensor between the cortex and cerebellum primary neurons, and thus, orientation between the donor and acceptor varies in the two neuronal types. We expect a similar scenario between cortical primary neurons in vitro and those in vivo. Of note, we have recently demonstrated that the cortex and cerebellum primary neurons exhibit distinct membrane properties (Lundin and Wieckiewicz et al., 2024 in revision), suggesting the different baseline FRET could be related to the different membrane properties between the cortex and cerebellum primary neurons. On the other hand, this raises a concern that 720/670 ratios can be affected not only by g-secretase activity but also by other cofounders, such as altered membrane properties. However, a small but significant correlation between the 720/670 ratio in a neuron and those ratios in its neighboring neurons is canceled by g-secretase inhibitor (Figure 5), suggesting that the correlation between the 720/670 ratio in a neuron and those in its neighboring neurons is most likely dependent on g-secretase activity. Taken together, we currently think orientation plays a significant role in our biosensor and would like to emphasize the importance of ensuring on a model-by-model basis whether the cleavage of the C99 720-670 biosensor by g-secretase increases or decreases 720/670 FRET ratios.

      Author response image 1.

      Furthermore, we co-expressed the C99 720-670 biosensor and visible range fluorescence reporters to record other biological events, such as changes in ion concentration, in cortex primary neurons. Interestingly, several biological events uniquely detected in the neurons with higher 720/670 ratios, which are expected to exhibit lower endogenous g-secretase activity, are recapitulated by pharmacological inhibition of g-secretase (unpublished observations), ensuring that higher 720/670 ratios are indicative of lower g-secretase activity in mouse cortex primary neurons. Such multiplexed imaging will help to further elucidate how the C99 720-670 biosensor behaves in response to the modulation of g-secretase activity.

      Lastly, the scope of this study was to develop and validate a novel imaging assay employing a NIR FRET biosensor to measure g-secretase activity on a cell-by-cell basis in live wild-type mouse brains. However, we do appreciate the reviewer’s suggestion and think employing this new platform in FAD PSEN1 knock-in (KI) or PSEN1 conditional knockout (cKO) mice would provide valuable information. Furthermore, we are keen to expand our capability to monitor g-secretase with subcellular resolution in live mouse brains in vivo, which we will explore in follow-up studies. Thank you for your thoughtful suggestions.

      Reference

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139.

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIR-FRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980.

      - Lundin B, Wieckiewicz N, Dickson JR, Sobolewski RGR, Sadek M, Armagan G, Perrin F, Hyman BT, Berezovska O, and Maesako M. APP is a regulator of endo-lysosomal membrane permeability. 2024 in revision

      Reviewer #2 (Public Review):

      Regarding the variability and spatial correlation- the dynamic range of the sensor previously reported in vitro is in the range of 20-30% change (Houser et al 2020) whereas the range of FR detected in vivo is between cells is significantly larger in this MS. This raises considerable doubts for specific detection of cellular activity.

      One direct way to test the dynamic range of the sensor in vivo, is to increase or decrease endogenous gamma-secretase activity and to ensure this experimental design allows to accurately monitor gamma-secretase activity. In the previous characterization of the reporter (Hauser et al 2020), DAPT application and inhibition of gamma-secretase activity results in increased FR (Figures 2 and 3 of Houser et al). This is in agreement with the design of the biosensor, since FR should be inversely correlated with enzymatic activity. Here, the authors repeated the experiment, and surprisingly found an opposite effect, in which DAPT significantly reduced FR.

      The authors maintain that this result could be due to differences in cell-types, However, this experiment was previously performed in cultures cortical neurons and many different cell types, as noted by the authors in their rebuttal.

      Instead, I would argue that these results further highlight the concerns of using FR in vivo, since based on their own data, there is no way to interpret this quantification. If DAPT reduces FR, does this mean we should now interpret the results of higher FR corresponds to higher g-sec activity? Given a number of papers from the authors claiming otherwise, I do not understand how one can interpret the results as indicating a cell-specific effect.

      In conclusion, without any ground truth, it is impossible to assess and interpret what FR measurements of this sensor in vivo mean. Therefore, the use of this approach as a way to study g-sec activity in vivo seems premature.

      Please find our response to reviewer 1’s similar critique above. Here, we again would like to re-clarify the design of our C99 720-670 biosensor. The orientation between the donor, miRFP670, and acceptor, miRFP720, is dependent on how C99, the sensing domain that is cleaved by g-secretase, and the anchoring domain are integrated into the membrane (Figure 1A). Although it was surprising to us, it is possible that g-secretase inhibition decreases 720/670 ratios if 1) the donor-acceptor orientation plays a significant role in FRET and 2) the baseline structure of the C99 720-670 biosensor is different between cell types. This appears to be the case between the cortex and cerebellum primary neurons (i.e., DAPT increases 720/670 ratios in the cortex neurons while decreasing in the cerebellum neurons), and we expect it in cortical neurons in vitro vs. in vivo as well. Hence, we recommend that users first validate whether the cleavage of the C99 720-670 biosensor by g-secretase increases or decreases 720/670 FRET ratios in their models. If DAPT increases 720/670 ratios (like in cortex primary neurons, CHO, MEF, and BV2 cells that we have validated), the results of higher ratios should be interpreted as lower g-secretase activity. If DAPT reduces 720/670 ratios (like in cerebellum primary neurons and the somatosensory cortex neurons in vivo), we should interpret the results of higher ratios corresponding to higher g-secretase activity. From a biosensing perspective, although we need to know which is the case on a model-by-model basis, we think whether g-secretase activity increases or decreases the 720/670 ratio is not critical; rather, if it can significantly change FRET efficiency is more important. Thank you for your critical comments.

      Reviewer #3 (Public Review):

      This paper builds on the authors' original development of a near infrared (NIR) FRET sensor by reporting in vivo real-time measurements for gamma-secretase activity in the mouse cortex. The in vivo application of the sensor using state-of-the-art techniques is supported by a clear description and straightforward data, and the project represents significant progress because so few biosensors work in vivo. Notably, the NIR biosensor is detectable to ~ 100 µm depth in the cortex. A minor limitation is that this sensor has a relatively modest ΔF as reported in Houser et al, which is an additional challenge for its use in vivo. Thus, the data is fully dependent on post-capture processing and computational analyses. This can unintentionally introduce biases but is not an insurmountable issue with the proper controls that the authors have performed here.

      The following opportunity for improving the system didn't initially present itself until the authors performed an important test of the FRET sensor in vivo following DAPT treatment. The authors get credit for diligently reporting the unexpected decrease in 720/670 FRET ratio. In turn this has led to a suggestion that this sensor would benefit from a control that is insensitive to gamma-secretase activity. FRET influences that are independent of gamma-secretase activity could be distinguished by this control.

      From previous results in cultured neurons, the authors expected an increase in FRET following DAPT treatment in vivo. These expectations fit with the sensor's mode-of-action because a block of gamma-secretase activity should retain the fluorophores in proximity. When the authors observed decreased FRET, the conclusion was that the sensor performs differently in different cellular contexts. However, a major concern is that mechanistically it is unclear how this could occur with this type of sensor. The relative orientation of fluorophores indeed can contribute to FRET efficiency in tension-based sensors. However, the proteolysis expected with gamma-secretase activity would release tension and orientation constraints. Thus, the major contributing FRET factor is expected to be distance, not orientation. Alternative possibilities that could inadvertently affect readouts include an additional DAPT target in vivo sequestering the inhibitor, secondary pH effects on FRET, photo-bleaching, or an unidentified fluorophore quencher in vivo stimulated by DAPT. Ultimately this new FRET sensor would benefit from a control that is insensitive to gamma-secretase activity. FRET influences that are independent of gamma-secretase activity could be distinguished by this control.

      Given that the anchoring domain is composed of three transmembrane regions and the linker connecting the donor, miRFP670, and the acceptor, miRFP720, is highly flexibility, we are still not sure if the orientation constraint of the C99 720-670 biosensor is canceled by g-secretase cleavage. This means that the orientation between the donor and acceptor in the cleaved form of the sensor can be different between model and model. As explained in response to the similar critique of reviewer 1, we found that the 720/670 ratio is significantly higher in the cerebellum than in the cortex neurons even in the presence of DAPT (Figure 1 for the review only). Therefore, we currently think the donor-acceptor orientation, both in the cleaved and non-cleaved forms of the sensor, plays a role in determining whether g-secretase activity increases or decreases the 720/670 ratio (but this view may change depends on the future discoveries).

      As the reviewer pointed out, the NIR g-secretase biosensor with no biological activity is important; however, a point mutation in the transmembrane region of the C99 sensing domain could also result in altered orientation between the donor, miRFP670, and the acceptor, miRFP720, since C99 is connected to the acceptor, which may bring additional complexity. Also, as noted in our point-by-point responses to the initial review, the mutation(s) that can fully block C99 processing by g-secretase has not been established. Therefore, we asked if a subtle but significant correlation we found between the 720/670 ratio in a neuron and those ratios in its neighboring neurons is canceled by g-secretase inhibitor administration. Since the correlation was abolished (Figure 5), it suggests that the correlation between the 720/670 ratio in a neuron and those ratios in the neighboring neurons depends on g-secretase activity.

      It is not fully established how g-secretase activity is spatiotemporally regulated; therefore, the development of more appropriate control biosensors and further validation of our findings with complementary approaches would be crucial in our follow-up studies. Thank you for your valuable comments.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Overall the authors provide a very limited data set and in fact only a proof of concept that their sensor can be applied in vivo. This is not really a research paper, but a technical note. With respect to their observation of clustered activity, the images do not convince me as they show only limited areas of interest: from these examples (for instance fig 5) one sees that merely all neurons in the field show variable activity and a clustering is not really evident from these examples. Even within a cluster, there is variability. With r values between 0.23 to .36, the correlation is not that striking. The authors herein do not control for expression levels of the sensor: for instance, can they show that in all neurons in the field, the sensor is equally expressed, but FRET activity is correlated in sets of neurons? Or are the FRET activities that are measured only in positively transduced neurons, while neighboring neurons are not expressing the sensor? Without such validation, it is difficult to make this conclusion.

      We appreciate the reviewer’s comment. We agree with the reviewer that this study is not testing a new hypothesis but rather developing and validating a novel tool. However, we do believe such a “technical note” is as important as a “research paper” since advancing technique(s) is the only way to break the barrier in our understanding of complex biological events. Therefore, this study aimed to develop and validate a novel imaging assay employing a recently engineered NIR FRET biosensor to measure γ-secretase activity (Houser et al., 2020) on a cell-by-cell basis in live mouse brains, enabling us for the first time to examine how γ-secretase activity is regulated in individual neurons in vivo, and uncover that γ-secretase activity may influence γ-secretase in neighboring neurons. Like the reviewer, we found that the cell-to-cell correlation is not that striking, as we clearly stated in the original manuscript: “Although the effect size is modest, we also found a statistically significant correlation between…” 

      We were also aware that there is variability in a cluster of neurons exhibiting similar γ-secretase activities. Per the reviewer’s request, the images have been expanded to the entire imaging field of view (new Figure 3A). Although the effect size is small, our unbiased quantification showed a statistically significant linear correlation between the 720/670 ratio in each neuron and the average ratio in five neighboring neurons (Figure 3, Figure 3—figure supplement 2, and Figure 4), and the correlation was canceled by the administration of γ-secretase inhibitor (Figure 5). These findings made it impossible to conclude that γ-secretase does not affect γ-secretase in neighboring neurons.

      Regarding the expression levels and pattern of the sensor, an AAV-based gene delivery approach employed in this study results in the expression of the sensor not in all but in selected neurons. We have newly performed immunohistochemistry, showing that approximately 40% of NeuN-positive neurons express the C99 720-670 biosensor (new Figure 1—figure supplement 2A and 2B).

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      (2) Secondly, I am lacking some more physiological relevance for this observation. The experiments are performed in wild-type mice, but it would be more relevant to compare this with a fadPSEN1 KI or a PSEN1cKO model to investigate the contribution of a gain of toxic function or LOF to the claimed cell non-autonomous activations. Or what would be the outcome if the sensor was targeted to glial cells?

      The AAV vector in this study encodes the human synapsin promoter and our new immunohistochemistry demonstrates that nearly 100% of the cells expressing the C99 720-670 sensor are NeuN positive, and we hardly detected the sensor expression in Iba-1 or GFAP-positive cells (new Figure 1— figure supplement 2A and 2C). 

      The mechanism underlying the cell non-autonomous regulation of γ-secretase remains unclear. As discussed in our manuscript, one of the potential hypotheses could be that secreted abeta42 plays a role (Zoltowska et al., 2023 eLife). Whereas this report focuses on the development and validation of a novel assay using wildtype mice, future follow-up studies employing FAD PSEN1 knock-in (KI) and PSEN1 conditional knockout (cKO) mice would allow us test the hypothesis above since abeta42 is known to increase in some FAD PSEN1 KI mice (Siman et al., 2000 J Neurosci, Vidal et al., 2012 FASEB J) while decreases in PSEN1 cKO mice (Yu et al., 2001 Neuron).  

      Reference

      - Siman R, Reaume AG, Savage MJ, Trusko S, Lin YG, Scott RW, Flood DG. Presenilin-1 P264L knockin mutation: differential effects on abeta production, amyloid deposition, and neuronal vulnerability. J Neurosci. 2000 Dec 1;20(23):8717-26. 

      - Vidal R, Sammeta N, Garringer HJ, Sambamurti K, Miravalle L, Lamb BT, Ghetti B. The Psen1-L166Pknock-in mutation leads to amyloid deposition in human wild-type amyloid precursor protein YAC transgenic mice. FASEB J. 2012 Jul;26(7):2899-910. 

      - Yu H, Saura CA, Choi SY, Sun LD, Yang X, Handler M, Kawarabayashi T, Younkin L, Fedeles B, Wilson MA, Younkin S, Kandel ER, Kirkwood A, Shen J. APP processing and synaptic plasticity in presenilin-1 conditional knockout mice. Neuron. 2001 Sep 13;31(5):713-26. 

      - Zoltowska KM, Das U, Lismont S, Enzlein T, Maesako M, Houser MC, Franco ML, Moreira DG, Karachentsev D, Becker A, Hopf C, Vilar M, Berezovska O, Mobley W, Chávez-Gutiérrez L. Alzheimer's disease linked Aβ42 exerts product feedback inhibition on γ-secretase impairing downstream cell signaling. eLife. 2023. 12:RP90690

      (3) For this reviewer it is not clear what resolution they are measuring activity, at cellular or subcellular level? In other words are the intensity spots neuronal cell bodies? Given g-sec activity are in all endosomal compartments and at the cell surface, including in the synapse, does NIR imaging have the resolution to distinguish subcellular or surface localized activities? If cells 'communicate' g-sec activities, I would expect to see hot spots of activity at synapses between neurons: is this possible to assess with the current setup? 

      Since this study aimed to determine how γ-secretase activity is regulated on a cell-by-cell basis in live mouse brains, the FRET signal was detected in neuronal cell bodies. While our current set-up for in vivo can only record γ-secretase activity with a cellular resolution, we previously detected predominant γ-secretase activity in the endo-lysosomal compartments (Maesako et al., 2022 J Neurosci) as well as in certain spots of neuronal processes (Maesako et al., 2020 iScience) in cultured primary neurons using the same microscope set-up. Therefore, future studies will expand our capability to monitor γ-secretase with subcellular resolution in live mouse brains in vivo.

      Reference

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      - Maesako M, Houser MCQ, Turchyna Y, Wolfe MS, Berezovska O. Presenilin/γ-Secretase Activity Is Located in Acidic Compartments of Live Neurons. J Neurosci. 2022 Jan 5;42(1):145-154. 

      (4) Without some more validation and physiological relevant studies, it remains a single observation and rather a technical note paper, instead of a true research paper.

      Please find our response above to the critique (1).  

      Reviewer #2 (Public Review):

      (1) Regarding the variability and spatial correlation- the dynamic range of the sensor previously reported in vitro is in the range of 20-30% change (Houser et al 2020) whereas the range of FR detected in vivo is between cells is significantly larger (Fig. 3). This raises considerable doubts for specific detection of cellular activity (see point 3).

      Please find our response below to the critique (2).

      (2) One direct way to test the dynamic range of the sensor in vivo, is to increase or decrease endogenous gamma-secretase activity and to ensure this experimental design allows to accurately monitor gamma-secretase activity. In the previous characterization of the reporter (Hauser et al 2020), DAPT application and inhibition of gammasecretase activity results in increased FR (Figures 2 and 3 of Houser et al). This is in agreement with the design of the biosensor, since FR should be inversely correlated with enzymatic activity. Here, while the authors repeat the same manipulation and apply DAPT to block gamma-secretase activity, it seems to induce the opposite effect and reduces FR (comparing figures 8 with figures 5,6,7). First, there is no quantification comparing FR with and without DAPT. Moreover, it is possible to conduct this experiment in the same animals, meaning comparing FR before and after DAPT in the same mouse and cell populations. This point is absolutely critical- if indeed FR is reduced following DAPT application, this needs to be explained since this contradicts the basic design and interpretation of the biosensor.

      We appreciate the reviewer’s comment. In our hand, overexpression of γ-secretase four components (PSEN, Nct, Aph1, and Pen2) is the only reliable and reproducible approach to increase the cellular activity of γ-secretase, which we successfully employed in vitro but not in vivo yet. Therefore, a γ-secretase inhibitor was used to determine the dynamic range of our FRET biosensor in vivo. FRET efficiency depends on the proximity and orientation of donor and acceptor fluorescent proteins. In our initial study, we engineered the original C99 EGFP-RFP biosensor (C99 R-G), and the replacement of EGFP and RFP with mTurquoise-GL and YPet, respectively, expanded the dynamic range of the sensor approximately 2 times. Moreover, extending the linker length from 20 a.a. to 80 a.a. increased the dynamic range 2.2 times (Maesako et al., 2020 iScience). Of note, the C99 720-670 NIR analog, which has the same 80 a.a. linker but miRFP670 and miRFP720 as the donor and acceptor, exhibited a slightly better dynamic range than the C99 Y-T sensor (Houser et al., 2020 Sensor). Our interpretation, at that time, was that the cleavage of the C99 720-670 biosensor by γ-secretase results in a longer distance between the donor and acceptor, and thus, the FRET ratio always increases by γ-secretase inhibition (i.e., proximity plays a more significant role than orientation in our biosensors). As expected, a significantly increased FRET ratio was detected in various cell lines by γ-secretase inhibitors, including CHO, MEF, BV2 cells, and mouse cortical primary neurons. Moreover, to further ensure the C99 720-670 biosensor records changes in γ-secretase activity, the multiplexing capability of the biosensor was utilized. In other words, we co-expressed the C99 720-670 biosensor and visible range fluorescence reporters to record other biological events, such as changes in ion concentration, etc., in cortex primary neurons. Strikingly, several biological events uniquely detected in the neurons with diminished endogenous γ-secretase activity, i.e., neurons with higher FRET ratios, are recapitulated by pharmacological inhibition of γ-secretase (unpublished observation). This approach has allowed us to ensure that increased FRET ratios are indicative of decreased endogenous γ-secretase activity in mouse cortical primary neurons. 

      However, as recommended by the reviewer, we have performed a new experiment to compare the FRET ratio before and after DAPT, a potent γ-secretase inhibitor, administration in the same mouse and cell populations. Surprisingly, we found that of DAPT significantly decreases 720/670 ratios, which is included in our revised manuscript (Figure 2—figure supplement 2C). This unexpected FRET reduction by γ-secretase inhibition was also found in mouse primary neurons derived from the cerebellum (unpublished observation). These findings suggest that orientation plays a significant role in our γ-secretase FRET biosensor and whether the FRET ratio is increased or decreased by the γ-secretase-mediated cleavage depends on cell types. Of note, the difference in FRET ratios with and without DAPT was comparable between primary cortex neurons (24.3%) and the somatosensory cortex neurons in vivo (22.1%). Our new findings suggest that how our biosensors report γ-secretase activity (i.e., increased vs. decreased FRET ratio) must be examined on a model-by-model basis, which is clearly noted in the revised manuscript: 

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      (3) For further validation, I would suggest including in vivo measurements with a sensor version with no biological activity as a negative control, for example, a mutation that prevents enzymatic cleavage and FRET changes. This should be used to showcase instrumental variability and would help to validate the variability of FR is indeed biological in origin. This would significantly strengthen the claims regarding spatial correlation within population of cells.

      We fully agree with the reviewer that having a sensor version containing a mutation, which prevents enzymatic cleavage and thus FRET changes, as a negative control is preferable. In our previous study, we developed and validated the APP-based C99 Y-T and Notch1-based N100 Y-T biosensors (Maesako et al., 2020 iScience). It is well established that Notch1 cleavage is entirely blocked by Notch1 V1744G mutation (Schroeter et al., 1998 Nature; Huppert et al., 2000 Nature), and therefore, we introduced the mutation into N100 Y-T biosensor and used it as a negative control. On the other hand, such a striking mutation has never been identified in APP processing. To successfully monitor γ-secretase activity in deep tissue in vivo, we replaced Turquoise-GL and YPet in the C99 Y-T and N100 Y-T biosensors with miRFP670 and miRFP720, respectively. While the APP-based C99 720-670 biosensor allows recording γ-secretase activity (Houser et al., 2020 Sensors), we found the N100 720-670 sensor exhibits a very small dynamic range, not enabling to reliably measure γ-secretase activity. Taken together, there is not currently available NIR γ-secretase biosensor with no biological activity.

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Huppert SS, Le A, Schroeter EH, Mumm JS, Saxena MT, Milner LA, Kopan R. Embryonic lethality in mice homozygous for a processing-deficient allele of Notch1. Nature. 2000 Jun 22;405(6789):966-70. 

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      - Schroeter EH, Kisslinger JA, Kopan R. Notch-1 signalling requires ligand-induced proteolytic release of intracellular domain. Nature. 1998 May 28;393(6683):382-6. 

      (4) In general, confocal microcopy is not ideal for in vivo imaging. Although the authors demonstrate data collected using IR imaging increases penetration depth, out of focus fluorescence is still evident (Figure 4). Many previous papers have primarily used FLIM based analysis in combination with 2p microscopy for in vivo FRET imaging (Some examples: Ma et al, Neuron, 2018; Massengil et al, Nature methods, 2022; DIaz-Garcia et al, Cell Metabolism, 2017; Laviv et al, Neuron, 2020). This technique does not rely on absolute photon number and therefore has several advantage sin terms of quantification of FRET signals in vivo.

      It is therefore likely that use of previously developed sensors of gamma-secretase with conventional FRET pairs, might be better suited for in vivo imaging. This point should be at least discussed as an alternative.

      The reviewer notes that 2p-FLIM may provide certain advantages over our confocal spectral imaging approach for detecting in vivo FRET. In our response below, we will address both the FRET detection method (FLIM vs. spectral) and microscope modality (2p vs. confocal). 

      As noted by the reviewer, we do acknowledge that 2p-FLIM has been utilized to detect FRET in vivo. On the other hand, the ratiometric spectral FRET approach has also been utilized in many in vivo FRET studies (Kuchibhotla et al., 2008 Neuron; Kuchibhotla et al., 2014 PNAS; Hiratsuka et al., 2015 eLife; Maesako et al., 2017 eLife; Konagaya et al., 2017 Cell Rep; Calvo-Rodriguez et al., 2020 Nat Communi; Hino et al., 2022 Dev Cell). We think both approaches have advantages and disadvantages, as discussed in a previous review (Bajar et al., 2016 Sensors), but they complement each other. Indeed, we regularly employ FLIM in cell culture studies (Maesako et al., 2017 eLife; McKendell et al., 2022 Biosensors; Devkota 2024 Cell Rep), and our recent study also utilized 2p-FLIM for in vivo NIR imaging (although not for detecting FRET) (Hou et al., 2023, Nat Biomed Eng); therefore, we are confident that 2p-FLIM can be adapted in our follow-up studies for γ-secretase recording.

      Regarding microscope modality, we agree with the reviewer’s point that generally two-photon microscopy can achieve larger penetration depths than confocal microscopy and is therefore more ideal for in vivo FRET imaging. However, in this study, since our aim was to quantify γ-secretase activity in the superficial layers of the cortex (<200 microns in depth), both NIR confocal and multiphoton microscopies could be used to achieve this imaging objective. Additionally, we chose to use confocal microscopy with our NIR C99 720-670 probe due to the probe’s slightly but higher sensitivity compared to our C99 Y-T probe (Houser et al., 2020 Sensors). Imaging γ-secretase activity with our NIR C99-720-670 probe has the additional advantage that it will allow us in future studies to multiplex with visible FRET pairs using multiphoton microscopy in the same brain region. Furthermore, our demonstration of in vivo FRET imaging using NIR confocal microscopy avoids some of the issues associated with multiphoton microscopy, including potential phototoxicity due to high average and peak laser powers and the high complexity and costs of the instrumentation. For future studies aimed at interrogating γ-secretase activity in deeper cortical regions, multiphoton microscopy could be applied for FLIM or ratiometric spectral imaging of either our NIR or visible FRET probes. Per the reviewer’s request, we have added multiphoton FRET imaging as an alternative in the discussion section. 

      Reference

      - Bajar BT, Wang ES, Zhang S, Lin MZ, Chu J. A Guide to Fluorescent Protein FRET Pairs. Sensors (Basel). 2016 Sep 14;16(9):1488.  

      - Calvo-Rodriguez M, Hou SS, Snyder AC, Kharitonova EK, Russ AN, Das S, Fan Z, Muzikansky A,

      Garcia-Alloza M, Serrano-Pozo A, Hudry E, Bacskai BJ. Increased mitochondrial calcium levels

      associated with neuronal death in a mouse model of Alzheimer's disease. Nat Commun. 2020 May

      1;11(1):2146  

      - Devkota S, Zhou R, Nagarajan V, Maesako M, Do H, Noorani A, Overmeyer C, Bhattarai S, Douglas JT, Saraf A, Miao Y, Ackley BD, Shi Y, Wolfe MS. Familial Alzheimer mutations stabilize synaptotoxic γ-secretase-substrate complexes. Cell Rep. 2024 Feb 27;43(2):113761. 

      - Hino N, Matsuda K, Jikko Y, Maryu G, Sakai K, Imamura R, Tsukiji S, Aoki K, Terai K, Hirashima T, Trepat X, Matsuda M. A feedback loop between lamellipodial extension and HGF-ERK signaling specifies leader cells during collective cell migration. Dev Cell. 2022 Oct 10;57(19):2290-2304.e7.

      - Hiratsuka T, Fujita Y, Naoki H, Aoki K, Kamioka Y, Matsuda M. Intercellular propagation of extracellular signal-regulated kinase activation revealed by in vivo imaging of mouse skin. eLife. 2015 Feb 10;4:e05178.  

      - Hou SS, Yang J, Lee JH, Kwon Y, Calvo-Rodriguez M, Bao K, Ahn S, Kashiwagi S, Kumar ATN, Bacskai BJ, Choi HS. Near-infrared fluorescence lifetime imaging of amyloid-β aggregates and tau fibrils through the intact skull of mice. Nat Biomed Eng. 2023 Mar;7(3):270-280.  

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Konagaya Y, Terai K, Hirao Y, Takakura K, Imajo M, Kamioka Y, Sasaoka N, Kakizuka A, Sumiyama K, Asano T, Matsuda M. A Highly Sensitive FRET Biosensor for AMPK Exhibits Heterogeneous AMPK Responses among Cells and Organs. Cell Rep. 2017 Nov 28;21(9):2628-2638.  

      - Kuchibhotla KV, Goldman ST, Lattarulo CR, Wu HY, Hyman BT, Bacskai BJ. Abeta plaques lead to aberrant regulation of calcium homeostasis in vivo resulting in structural and functional disruption of neuronal networks. Neuron. 2008 Jul 31;59(2):214-25  

      - Kuchibhotla KV, Wegmann S, Kopeikina KJ, Hawkes J, Rudinskiy N, Andermann ML, Spires-Jones TL, Bacskai BJ, Hyman BT. Neurofibrillary tangle-bearing neurons are functionally integrated in cortical circuits in vivo. Proc Natl Acad Sci U S A. 2014 Jan 7;111(1):510-4  

      - Maesako M, Horlacher J, Zoltowska KM, Kastanenka KV, Kara E, Svirsky S, Keller LJ, Li X, Hyman BT, Bacskai BJ, Berezovska O. Pathogenic PS1 phosphorylation at Ser367. Elife. 2017 Jan 30;6:e19720.  

      - McKendell AK, Houser MCQ, Mitchell SPC, Wolfe MS, Berezovska O, Maesako M. In-Depth

      Characterization of Endo-Lysosomal Aβ in Intact Neurons. Biosensors (Basel). 2022 Aug 20;12(8):663. 

      (Recommendations For The Authors):

      (5) Minor issues- Figure 4 describes the analysis procedure, which seems to be standard practice in the field. This can be described in the methods section rather than in the main figure.

      Per the reviewer’s suggestion, this figure has been moved to Figure 2—figure supplement 1. 

      Reviewer #3 (Public Review):

      (1) This paper builds on the authors' original development of a near infrared (NIR) FRET sensor by reporting in vivo real-time measurements for gamma-secretase activity in the mouse cortex. The in vivo application of the sensor using state of the art techniques is supported by a clear description and straightforward data, and the project represents significant progress because so few biosensors work in vivo. Notably, the NIR biosensor is detectable to ~ 100 µm depth in the cortex. A minor limitation is that this sensor has a relatively modest ΔF as reported in Houser et al, which is an additional challenge for its use in vivo. Thus, the data is fully dependent on post-capture processing and computational analyses. This can unintentionally introduce biases but is not an insurmountable issue with the proper controls that the authors have performed here.

      We appreciate the reviewer’s overall positive evaluation. As described in our response to the Reviewer 2’s critique (2), ΔF in vivo has been characterized (Figure 2—figure supplement 2C).

      (2) The observation of gamma-secretase signaling that spreads across cells is potentially quite interesting, but it can be better supported. An alternative interpretation is that there exist pre-formed and clustered hubs of high gamma-secretase activity, and that DAPT has stochastic or differential accessibility to cells within the cluster. This could be resolved by an experiment of induction, for example, if gamma-secretase activity is induced or activated at a specific locale and there was observed coordinated spreading to neighboring neurons with their sensor.

      We agree with the reviewer that the stochastic or differential accessibility of DAPT to cell clusters with different γ-secretase can be an alternative interpretation of our data, which is now included in the Discussion of the revised manuscript. Undoubtedly, the activation of γ-secretase would provide valuable information. However, as described in the response above to Reviewer 2’s critique #2, overexpressing the four components of γ-secretase (PSEN, Nct, Aph1, and Pen2) is the only reliable and reproducible approach to increasing the cellular activity of γ-secretase, which was achieved in our in vitro study but not yet in vivo. Our future study will develop and characterize the approach to induce γ-secretase activity to further perform detailed mechanistic studies.

      (3) Furthermore, to rule out the possibility that uneven viral transduction was not simply responsible for the observed clustering, it would be helpful to see an analysis of 670nm fluorescence alone.

      Our new analysis comparing 670 nm fluorescence intensity and that in five neighbor neurons shows a positive correlation (Figure 3—figure supplement 1A), suggesting that AAV was unevenly transduced. On the other hand, the 720/670 ratio (i.e., γ-secretase activity) is not correlated with 670 nm fluorescence intensity (i.e., C99 720-670 biosensor expression) (Figure 3—figure supplement 1B). This strongly suggests that, while C99 720-670 biosensor expression was not evenly distributed in the brain, the uneven probe expression did not impact the capability of γ-secretase recording.  

      Reviewer #3 (Recommendations For The Authors):

      (4) One minor suggestion might be to consider Figures 6-7 as orthogonal supporting analyses rather than "validation". It might then be helpful to present them together with Figure 5.

      We have moved the initial Figure 6 and 7 to Figure 3—figure supplement 2 and Figure 4, respectively.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      My main concern is still in place. It is unclear whether the proposed method can find actual goal states, and as a result it is unclear what states it finds. Table S1 mentions the model BIOMD0000000454, which is a small metabolic pathway with known equations given in "Example One" in "Metabolic Control Analysis: Rereading Reder". In this model the goal states can be calculated analytically.

      Regarding your statements below: I am not concerned that your method will be less efficient than random search (or any other search..) on small models, but I think it is important for the readers to have evidence that your method is able to discover true goal states at least in small networks, used in your study. You do show that your method scales to complex models. So, in my opinion, the missing part is to show that it is able to find true goal states.

      "...For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models..."

      We thank you for your response and for your concerns on the lack of evidence that our method is able to re-discover the true goal states of simple models when these are known a priori. We acknowledge that adding these simple cases is useful for completeness. We did not include these simple models in our main study because in most cases a basic random search over the initial conditions will lead to the re-discovery of these goal states. For instance for the mentioned model BIOMD0000000454 described in the "Example One" from the "Metabolic Control Analysis: Rereading Reder" paper, several simplifying assumptions are made such that the system only has one steady state (x1=0.056, x2=0.769, x3=4.231) which can be found analytically as shown in the paper. In that simple case, this goal state is also straightforward to find with numerical simulation as any valid initial condition will converge to it.

      To address the concerns of the reviewer, we propose to add an additional "sanity check" figure in the supplementary of the revised paper (Figure S4), as well as a “sanity check” subsection in the “Methods”, to present additional experiments made on  simple models such as this one. The novel figure and subsection can be visualized on the paper’s interactive version available online https://developmentalsystems.org/curious-exploration-of-grn-competencies, and we plan to include them as such in the further revision.  We have also included the full code to reproduce this sanity check as a ‘sanity_check.ipynb’  jupyter notebook in the github repository (https://github.com/flowersteam/curious-exploration-of-grn-competencies/blob/main/notebooks/sanity_check.ipynb).

      In the novel figure S4-b, we show the results of our exploration pipeline on the suggested model BIOMD0000000454 as described in the "Example One" of the paper. These results provide evidence that the curiosity search is able to find back the correct unique goal state (x1=0.056, x2=0.769, x3=4.231), as expected.

      We also include a second sanity check on BIOMD0000000341 which models the dynamics of beta-cell mass, insulin and glucose dynamics. This model has two stable fixed points representing physiological (B=300, I=10, G=100) and pathological (B=0, I=0, G=600) steady states, which are the known ground truth steady states as described in Figure 3 of the "A Model of b-Cell Mass, Insulin, and Glucose Kinetics: Pathways to Diabetes" paper. Again, as expected, curiosity search is able to find back those two steady states (Figure S4-a).

      As stated in our previous answer, our main study focuses on more complex models that are not limited to one or few attractors that can easily be discovered with random initial conditions. Regarding the mentioned BIOMD0000000454, maybe something that has been confusing for the reviewer is that we indeed included it in our main study but, as specified in the caption of table S4, at the difference of what is done in the "example one" of the original paper, we let the metabolite concentrations y1,...,y5 evolve in time (instead of enforcing them as constants). When doing so, the resulting dynamics of the system are more complex and exhibit a spectrum of possible steady states (unknown a priori), which differ from the previous case with a single steady state. In that case, the new attractors are not analytically easy to find and the proposed curiosity search becomes interesting as it is able to uncover the distribution of possible steady states much more efficiently than a random search baseline, as shown in the new figures S4-c and S4-d.

      We hope that these new results will address the reviewer’s concerns and provide evidence to the readers on the validity of the approach on simple networks.

      eLife assessment

      This important study develops a machine learning method to reveal hidden unknown functions and behavior in gene regulatory networks by searching parameter space in an efficient way. The evidence for some parts of the paper is still incomplete and needs systematic comparison to other methods and to the ground truth, but the work will be of broad interest to anyone working in biology of all stripes since the ideas reach beyond gene regulatory networks to revealing hidden functions in any complex system with many interacting parts.

      We thank the editors and reviewers for their positive assessment and constructive suggestions. In our response, we acknowledge the importance of systematic comparison to other methods and to the ground truth, when available. However we also emphasize the challenges associated with evaluating such methods in the context of uncovering hidden behaviors in complex biological networks as the ground truth is often unknown. We hope that our explanations will clarify the potential of our approach in advancing the exploration of these systems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: This paper suggests to apply intrinsically-motivated exploration for the discovery of robust goal states in gene regulatory networks.

      Strengths:

      The paper is well written. The biological motivation and the need for such methods are formulated extraordinarily well. The battery of experimental models is impressive.

      We thank the reviewer for sharing interest in the research problem and for recognizing the strengths of our work.

      Weaknesses:

      (1) The proposed method is compared to the random search. That says little about the performance with regard to the true steady-state goal sets. The latter could be calculated at least for a few simple ODE (e.g., BIOMD0000000454, `Metabolic Control Analysis: Rereading Reder'). The experiment with 'oscillator circuits' may not be directly interpolated to the other models.

      The lack of comparison to the ground truth goal set (attractors of ODE) from arbitrary initial conditions makes it hard to evaluate the true performance/contribution of the method. A part of the used models can be analyzed numerically using JAX, while there are models that can be analyzed analytically.

      "...The true versatility of the GRN is unknown and can only be inferred through empirical exploration and proxy metrics....": one could perform a sensitivity analysis of the ODEs, identifying stable equilibria. That could provide a proxy for the ground truth 'versatility'.

      We agree with the reviewer that one primary concern is to properly evaluate the effectiveness of the proposed method. However, as we move toward complex pathways, knowledge of the “true” steady-state goal sets is often unknown which is where the use of machine learning methods as the one we propose are particularly interesting (but challenging to evaluate).

      For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models. While we agree that it is still interesting to evaluate exploration methods on these simple models for checking their behavior, it is not clear how to scale this analysis to the targeted more complex systems.

      For systems whose true steady state distribution cannot be derived analytically or numerically, we believe that random search is a pertinent baseline as it is commonly used in the literature to discover the attractors/trajectories of a biological network. For instance, Venkatachalapathy et al. [1] initialize stochastic simulations at multiple randomly sampled starting conditions (which is called a kinetic Monte Carlo-based method) to capture the steady states of a biological system. Similarly, Donzé et al. [29] use a Monte Carlo approach to compute the reachable set of a biological network «when the number of parameters  is large and their uncertain range  is not negligible». For the considered models, the true steady-state goal set is unknown, which is why we chose comparison with random search. We added a “Statistics” subsection in the Methods section providing additional details about the statistical analyses we perform between our method and the random search baseline.

      (2) The proposed method is based on `Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning', which assumes state action trajectories [s_{t_0:t}, a_{t_0:t}], (2.1 Notations and Assumptions' in the IMGEP paper). However, the models used in the current work do not include external control actions, but rather only the initial conditions can be set. It is not clear from the methods whether IMGEP was adapted to this setting, and how the exploration policy was designed w/o actual time-dependent actions. What does "...generates candidate intervention parameters to achieve the current goal....", mean considering that interventions 'Sets the initial state...' as explained in Table 2?

      We thank the reviewer for asking for clarification, as indeed the IMGEP methodology originates from developmental robotics scenarios which generally focus on the problem of robotic sequential decision-making, therefore assuming state action trajectories as presented in Forestier et al. [65]. However, in both cases, note that the IMGEP is responsible for sampling parameters which then govern the exploration of the dynamical system. In Forestier et al. [65], the IMGEP also only sets one vector at the start (denoted ) which was specifying parameters of a movement (like the initial state of the GRN), which was then actually produced with dynamic motion primitives which are dynamical system equations similar to GRN ODEs, so the two systems are mathematically equivalent. More generally, while in our case the “intervention” of the IMGEP (denoted ) only controls the initial state of the GRN, future work could consider more advanced sequential interventions simply by setting parameters of an action policy  at the start which could be called during the GRN’s trajectory to sample control actions  where  would be the state of the GRN. In practice this would also require setting only one vector at the start, so it would remain the same exploration algorithm and only the space of parameters would change, which illustrates the generality of the approach.

      (3) Fig 2 shows the phase space for (ERK, RKIPP_RP) without mentioning the typical full scale of ERK, RKIPP_RP. It is unclear whether the path from (0, 0) to (~0.575, ~3.75) at t=1000 is significant on the typical scale of this phase space. is it significant on the typical scale of this phase space?

      The purpose of Figure 2 is to illustrate an example of GRN trajectory in transcriptional space, and to illustrate what “interventions” and “perturbations” can be in that context. To that end we have used the fixed initial conditions provided in the BIOMD0000000647, replicating Figure 5 of Cho et al. [56].

      While we are not sure of what the reviewer means with “typical” scale of this phase space, we would like to point reviewer toward Figure 8 which shows examples of certain paths that indeed reach further point in the same phase space (up to ~10 in RKIPP_RP levels and ~300 in ERK levels). However, while the paths displayed in Figure 8 are possible (and were discovered with the IMGEP), note that they may be “rarer” to occur naturally  in the sense that a large portion of the tested initial conditions with random search tend to converge toward smaller (ERK, RKIPP_RP) steady-state values similar to the ones displayed in Figure 2.

      (4) Table 2:

      a. Where is 'effective intervention' used in the method?

      b. in my opinion 'controllability', 'trainability', and 'versatility' are different terms. If their correspondence is important I would suggest to extend/enhance the column "Proposed Isomorphism". otherwise, it may be confusing.

      a) We thank the reviewer for pointing out that “effective intervention” is not explicitly used in the method. The idea here is that as we are exploring a complex dynamical system (here the GRN), some of the sampled interventions will be particularly effective at revealing novel unseen outcomes whereas others will fail to produce a qualitative change to the distribution of discovered outcomes. What we show in this paper, for instance in Figure 3a and Figure 4, is that the IMGEP method is particularly sample-efficient in finding those “effective interventions”, at least more than a random exploration. However we agree that the term “effective intervention” is ambiguous (does not say effective in what) and we have replaced it with “salient intervention” in the revised version.

      b) We thank the reviewer for highlighting some confusing terms in our chosen vocabulary, and we have clarified those terms in the revised version. We agree that controllability/trainability and versatility are not exactly equivalent concepts, as controllability/trainability typically refers to the amount to which a system is externally controllable/trainable whereas versatility typically refers to the inherent adaptability or diversity of behaviors that a system can exhibit in response to inputs or conditions. However, they are both measuring the extent of states that can be reached by the system under a distribution of stimuli/conditions, whether natural conditions or engineered ones, which is why we believe that their correspondence is relevant.

      I don't see how this table generalizes "concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective".

      We have replaced the verb “generalize” with “investigate” in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Etcheverry et al. present two computational frameworks for exploring the functional capabilities of gene regulatory networks (GRNs). The first is a framework based on intrinsically-motivated exploration, here used to reveal the set of steady states achievable by a given gene regulatory network as a function of initial conditions. The second is a behaviorist framework, here used to assess the robustness of steady states to dynamical perturbations experienced along typical trajectories to those steady states. In Figs. 1-5, the authors convincingly show how these frameworks can explore and quantify the diversity of behaviors that can be displayed by GRNs. In Figs. 6-9, the authors present applications of their framework to the analysis and control of GRNs, but the support presented for their case studies is often incomplete.

      Strengths:

      Overall, the paper presents an important development for exploring and understanding GRNs/dynamical systems broadly, with solid evidence supporting the first half of their paper in a narratively clear way.

      The behaviorist point of view for robustness is potentially of interest to a broad community, and to my knowledge introduces novel considerations for defining robustness in the GRN context.

      We thank the reviewer for recognizing the strengths and novelty of the proposed experimental framework for exploring and understanding GRNs, and complex dynamical systems more generally. We agree that the results presented in the section “Possible Reuses of the Behavioral Catalog and Framework” (Fig 6-9) can be seen as incomplete along certain aspects, which we tried to make as explicit as possible throughout the paper, and why we explicitly state that these are “preliminary experiments”. Despite the discussed limitations, we believe that these experiments are still very useful to illustrate the variety of potential use-cases in which the community could benefit from such computational methods and experimental framework, and build on for future work.

      Some specific weaknesses, mostly concerning incomplete analyses in the second half of the paper:

      (1) The analysis presented in Fig. 6 is exciting but preliminary. Are there other appropriate methods for constructing energy landscapes from dynamical trajectories in gene regulatory networks? How do the results in this particular case study compare to other GRNs studied in the paper?

      We are not aware of other methods than the one proposed by Venkatachalapathy et al. [1] for constructing an energy landscape given an input set of recorded dynamical trajectories, although it might indeed be the case. We want to emphasize that any of such methods would anyway depend on the input set of trajectories, and should therefore benefit from a set that is more representative of the diversity of behaviors that can be achieved by the GRN, which is why we believe the results presented in Figure 6 are interesting. As the IMGEP was able to find a higher diversity of reachable goal states (and corresponding trajectories) for many of the studied GRNs, we believe that similar effects should be observable when constructing the energy landscapes for these GRN models, with the discovery of additional or wider “valleys” of reachable steady states.

      Additionally, it is unclear whether the analysis presented in Fig. 6C is appropriate. In particular, if the pseudopotential landscapes are constructed from statistics of visited states along trajectories to the steady state, then the trajectories derived from dynamical perturbations do not only reflect the underlying pseudo-landscape of the GRN. Instead, they also include contributions from the perturbations themselves.

      We agree that the landscape displayed Fig. 6C integrates contributions from the perturbations on the GRN’s behavior, and that it can shape the landscape in various ways, for instance affecting the paths that are accessible, the shape/depth of certain valleys, etc. But we believe that qualitatively or quantitatively analyzing the effect of these perturbations  on the landscape is precisely what is interesting here: it might help 1) understand how a system respond to a range of perturbations and to visualize which behaviors are robust to those perturbations, 2) design better strategies for manipulating those systems to produce certain behaviors

      (2) In Fig. 7, I'm not sure how much is possible to take away from the results as given here, as they depend sensitively on the cohort of 432 (GRN, Z) pairs used. The comparison against random networks is well-motivated. However, as the authors note, comparison between organismal categories is more difficult due to low sample size; for instance, the "plant" and "slime mold" categories each only have 1 associated GRN. Additionally, the "n/a" category is difficult to interpret.

      We acknowledge that this part is speculative as stated in the paper: “the surveyed database is relatively small with respect to the wealth of available models and biological pathways, so we can hardly claim that these results represent the true distribution of competencies across these organism categories”. However, when further data is available, the same methodology can be reused and we believe that the resulting statistical analyses could be very informative to compare organismal (or other) categories.

      (3) In Fig. 8, it is unclear whether the behavioral catalog generated is important to the intervention design problem of moving a system from one attractor basin to another. The authors note that evolutionary searches or SGD could also be used to solve the problem. Is the analysis somehow enabled by the behavioral catalog in a way that is complementary to those methods? If not, comparison against those methods (or others e.g. optimal control) would strengthen the paper.

      We thank the reviewer for asking to clarify this point, which might not be clearly explained in the paper. Here the behavioral catalog is indeed used in a complementary way to the optimization method, by identifying a representative set of reachable attractors which are then used to define the optimization problem. For instance here, thanks to the catalog, we 1) were able to identify a “disease” region and several possible reachable states in that region and 2) use several of these states as starting points of our optimization problem, where we want to find a single intervention that can successfully and robustly reset all those points, as illustrated in Figure 8. Please note that given this problem formulation, a simple random search was used as an optimization strategy. When we mention more advanced techniques such as EA or SGD, it is to say that they might be more efficient optimizers than random search. However, we agree that in many cases optimizing directly will not work if starting from random or bad initial guess, and this even with EA or SGD. In that case the discovered behavioral catalog can be useful to better initialize  this local search and make it more efficient/useful, akin to what is done in Figure 9.

      (4) The analysis presented in Fig. 9 also is preliminary. The authors note that there exist many algorithms for choosing/identifying the parameter values of a dynamical system that give rise to a desired time-series. It would be a stronger result to compare their approach to more sophisticated methods, as opposed to random search and SGD. Other options from the recent literature include Bayesian techniques, sparse nonlinear regression techniques (e.g. SINDy), and evolutionary searches. The authors note that some methods require fine-tuning in order to be successful, but even so, it would be good to know the degree of fine-tuning which is necessary compared to their method.

      We agree that the analysis presented in Figure 9 is preliminary, and thank the reviewer for the suggestion. We would first like to refer to other papers from the ML literature that have more thoroughly analyzed this issue, such as Colas et al. [74] and Pugh et al. [34], and shown the interest of diversity-driven strategies as promising alternatives.  Additionally, as suggested by the reviewer, we added an additional comparison to the CMA-ES algorithm in the revised version in order to complete our analysis. CMA-ES is an evolutionary algorithm which is self-adaptive in the optimization steps and that is known to be better suited than SGD to escape local minimas when the number of parameters is not too high (here we only have 15 parameters). However, our results showed that while CMA-ES explores more the solution space at the beginning of optimization than SGD does, it also ultimately converges into a local minima similarly to SGD. The best solution converges toward a constant signal (of the target b) but fails to maintain the target oscillations, similar to the solutions discovered by gradient descent. We tried this for a few hyperparameters (init mean and std) but always found similar results.  We have updated the figure 9 image and caption, as well as descriptive text, to include these novel results in the revised version. We also added a reference to the CMA-ES paper in the citations.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest to conduct a more rigor analysis of the performance by estimating/approximating the ground truth robust goal sets in important GRNs.

      Also, the use of terminology from different disciplines can be improved. Please see my comments above. Specifically, the connection between controllability in dynamical control systems and versatility used in this paper is unclear.

      We hope to have addressed the reviewer's concerns in our previous answers.

      Reviewer #2 (Recommendations For The Authors):

      Fig 4b: I'm not sure if DBSCAN is the appropriate method to use here, as the visual focus on the core elements of the clusters downplays the full convex hull of the points that random sampling achieves in Z space. An analysis based on convex hulls or the ball-coverage from Fig. 3b would presumably generate plots that were more similar between random sampling and curiosity search. If the goal is to highlight redundancy/non-linearity in the mapping between Z and I, another approach might be to simply bin Z-space in a grid, or to use a clustering algorithm that is less stringent about core/noise distinctions.

      We thank the reviewer for the suggestion. This plot is intended to convey the reader an understanding of why a method that uniformly samples goals in Z (what the  IMGEP is doing), is more efficient than a method that uniformly samples parameters in I (what the random search is doing), in systems for which there is high redundancy/non-linearity in the mapping between I and Z. We agree that binning the Z-space in a grid and counting the number of achieved bins is a way to quantitatively measure this, which is by the way very close to what we do in Figure 3 for measuring the achieved diversity. We believe however that the clustering and coloring provides additional intuitions on why this is the case: it illustrates that large regions of the intervention space map to small regions in the outcome space and vice versa.

      Additional changes in the revised version:

      We added a sentence in the Methods section as well as in the caption of Table S1 providing additional details about the way we simulate the biological models from the BioModels website

      We fixed a wrong reference to Figure 4 in the Methods “Sensitivity measure” subsection with reference to Figure 5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      The process of EMT is a major contributor to metastasis and chemoresistance in breast cancer. By using a modified PyMT model that allows the identification of cells undergoing EMT and their decedents via S100A4-Cre mediated recombination of the mTmG allele, Ban et al. tackle a very important question of how tumor metastasis and therapy resistance by EMT can be blocked. They identified that pathways associated with ribosome biogenesis (RiBi) are activated during transition cell states. This finding represents a promising therapeutic target to block any transition from E to M (activated during cell dissemination and invasion) as well as from M to E (activated during metastatic colonization). Inhibition of RiBi-blocked EMT also reduced the establishment of chemoresistance that is associated with an EMT phenotype. Hence, RiBi blockage together with standard chemotherapy showed synergistic effects, resulting in impaired colonization/metastatic outgrowth in an animal model. The study is of great interest and of high clinical relevance as the authors show that blocking the transition from E to M or vice versa targets both aspects of metastasis, dissemination from the primary tumor, and colonization in distant organs. 

      We appreciate the positive acknowledgment of our work.

      The study is done with high skill using state-of-the-art technology and the conclusions are convincing and solid, but some aspects require some additional experimental support and clarification. It remains elusive whether blocking of EMT/MET is necessary for the synergistic effect of standard chemotherapy together with RiBi blockage or whether a general growth disadvantage of RiBi-treated cells independent of blocking transition is responsible. 

      We appreciate the reviewer for raising the pertinent query regarding the interrelation between EMT/MET blocking by RiBi inhibition and its synergistic effect with chemotherapy drugs. Our experimental data suggests a potential consequence of these events. Specifically, when assessing the potency of RiBi inhibitors (BMH21 and CX5410), we observed a pronounced EMT/MET blocking effect at concentrations preceding the emergence of cytotoxic effects (refer to Fig. 4 and Supplementary Fig S8). Notably, the IC50 for BMH21 was approximately 200nM, which is a concentration surpassing those that manifested the EMT/MET blocking effects. Crucially, the enhanced synergy of RiBi inhibitors with chemotherapy drugs was predominantly seen at these lower concentrations (as illustrated in Supplementary Fig S10). Therefore, the EMT/MET blocking by RiBi inhibition, rather than the cytotoxic effect, is likely instrumental for the synergy with chemotherapy drugs. The result was highlighted in Page#16.

      How can specific effects on state transition by RiBI block be separated from global effects attributed to overall reduced protein biosynthesis, proliferation etc.? 

      We appreciate the reviewer's insightful query. We agree that RiBi activity and associated protein synthesis are fundamental processes for cell viability, making it challenging to clearly delineate the overall effects of RiBi blockage to the specific effects of EMT state transition. Our results showed an elevated RiBi activity during the EMT transitioning phases, concomitant with enhanced nascent protein synthesis, indicating a higher-than-normal requirement of new proteins for cells to switch their phenotype. This would provide us a chance to target the excessive activities of RiBi to block EMT/MET transition. Based on a similar consideration, we chose to apply shRNA instead of CRISPR technology to modulate RiBi gene expression. By comparing to scramble controls, the growth rates of the Rps knockdown cells (both RFP+ and GFP+ cells) were not significantly affected, while the EMT/MET transitioning was impaired (Supplementary Fig 9). These results may provide evidence of uncoupling the cell proliferation and EMT/MET status changes by inhibiting RiBi pathway.  

      Some other aspects are misleading or need extension. 

      Reviewer #1 (Recommendations For The Authors): 

      (1) The analysis of RiBi expression during EMT in Fig. 1K shows that transition states have high RiBi levels, whereas E and M states are low. Analyses of MET in Fig.2G indicate that M states have the lowest, transition states upregulate RiBi while E states have the highest levels of RiBi expression. This is puzzling and how can it be explained? It would be helpful to demonstrate how these two settings are related by combining results from Figs 1 and 2 in an E-Trans-M-Trans-E state graph (in a sequence of EMT/MET). Does it mean that the initial E state starts with lower RiBi and the final E state displays the highest RiBi expression? In other words, are the initial E state and the one after MET different? 

      Thank the reviewer for raising the concern about which EMT/MET state exhibits the highest RiBi activity. Following the reviewer's suggestions, we merged the scRNA-seq data of EMT and MET cells and performed the trajectory analysis. Similar epithelial-mesenchymal spectrums were detected from these cells (For reviewers Fig 1). Notably, the highest RiBi activity was detected in the early EMT transitioning or the late MET transitioning cells (revised For reviewers Fig 1D). Addressing the question of the reviewer, the initial E state (of EMT cells) did not show significant differences to the final E state (of MET cells) in comparisons of EMT pseudotime and RiBi activities. In addition, the analysis with merged cells also revealed:

      (1) Both the EMT (In_Vitro_Mix) and MET (In_Vivo_GFP) cells were generally divided into two major clusters representing epithelial and mesenchymal phenotypes (For reviewers Fig 1A, 1B).

      (2) The EMT and MET cells exhibited similar EMT spectrums (EMT/MET status, and pseudotime) in the trajectory analysis (For reviewers Fig 1C, 1D).

      (3) Cells with high RiBi activity were mostly from the transitioning cell during EMT (In_Vitro_Mix) cells (For reviewers Fig 1D).  

      (2) It needs to be elaborated on how the experiment in Fig. 4A was exactly done. Are there cells isolated directly from the autochthonous TriPyMT tumor in contrast to steady-state cultures from Fig. 1? Does the control graph represent 0d in culture or have the cells been cultured for the same amount of time as the treated samples? How do these observed 15% GFP+ cells are related to the 15% GFP+ cells obtained at day 0 and 34% at d7 control condition in Fig. 5A? 

      Following the reviewer’s suggestion, we have amended the figure legend to clarify the experiment settings. In Fig. 4A, we initiated the experiment with sorted RFP+/Epcam+ cells. The control cells were cultured for the same period of time (5 days) as drug-treated cells did. We apologize for the unclear description. The percentage of GFP+ cells in this experiment is not related to the experiment in Fig 5A, where the initial cell population comprised an unsorted mix of RFP/GFP cells. 

      (3) Fig. 4B: Since the bulk population is loaded in the WB, does that suggest that the epithelial state is stabilized/enhanced or does it reflect only different cell ratios? So, it would be important to show the WB for RFP+ and GFP+ cells separately. 

      Thank the reviewer for the query regarding Fig. 4B. We apologize for the unclear explanation. The experimental setup for Fig 4B was identical to that of Fig 4A, where the sorted RFP+ cells were utilized at the start. Indeed, the observed increase in epithelial markers and decrease in mesenchymal markers in cells treated with BMH and CX suggest a higher proportion of cells maintaining the RFP+ state. 

      Performing WB for RFP+ and GFP+ cells separately may not address the question we asked since the experiment was initialed with pure RFP+ cells. Also, the expression of the fluorescent markers is closely aligned with the EMT status of the cells with and without drug treatment.  

      (4) Figs. 4-6: The authors claim that there is less EMT under treatment. If the experiment was done over 5 days (as indicated in Fig.4b legend), it is necessary to rule out that shifts in E/M ratios are attributed to the effects of treatment on proliferation/survival affecting both populations differently. How do the same cells grow under treatment when injected orthotopically/subcutaneously? 

      We apologized for the unclear descriptions. The effect of blocking the transitioning of EMT with RiBi inhibitors were performed with purified RFP+/EpCam+ cells. All GFP+ cells in this experiment setting were transformed from RFP+ cells. Given the fluorescence switch was well correlated with EMT status of cells, RFP and GFP were used as EMT reporters. Similarly, we used purified GFP+/EpCam- cells as the initial population to study the MET process of tumor cells.

      To address the reviewer's concern regarding how RiBi inhibition may differentially affect the growth of RFP+ and GFP+ cells, we conducted a cell cycle assay using Tri-PyMT cells, which include both RFP+ and GFP+ populations. Our results demonstrated that both RFP+ and GFP+ cells exhibited a trend towards G2/M phase accumulation when treated with BMH21. It is important to note that the impact of BMH21 on the cell cycle was less pronounced than previously reported by Fu et al. (Oncol Rep, 2017). This is likely because the dose used for EMT inhibition in our study was approximately one-tenth of the dose known to inhibit cell growth (For Reviewers Fig 2). Also, no significantly differential impacts were detected between RFP+ and GFP+ cells. 

      We have previously characterized the proliferation rate of RFP+ and GFP+ populations (Lourenco et al 2020). RFP+ cells proliferate faster than GFP+ cells. Primary tumor cells derived from RFP+ cells also grew faster than GFP+ tumors (Lourenco et al 2020).

      (5) Fig. 6B: this image is puzzling. Only in the lower two panels the outline of the lung is visualized by DAPI staining. The upper two panels look like there is no lung tissue in ctrl (no DAPI+GFP-RFP- cells) or show almost exclusively DAPI+GFP-RFP- cells that are present in a clustered assembly. Do the latter represent lymphoid cell clusters or normal lung tissue? 

      To improve the clarity of fluorescent images in Fig 6B, we enlarged the merge images with higher contrast (Revised Fig. 6B). The DAPI+/RFP-/GFP- region represent normal lung tissue. Nodules with either RFP or GFP signals represent tumor lesions.  

      (6) Text: Several typos and sentences should be revised, including p. 3 "Le et al. discovered" which should read as "Li et al. discovered", p.8 "Vimten", p.10 "Cells were then classified cells into three main categories", GSEA should be spelled out as Gene Set Enrichment Analysis (not Assay), p. 13 "cells, suggesting the impaired MET capability with upon treatment". 

      We apologize for the typos. All were corrected in the revised manuscript.

      (7) Figures: Color gradient indicator in Fig. 1E does not reflect the colors of the cells, Fig. S5A+C are not referenced in the text, there is mislabeling of S5B,C,D in the legend, graph in Fig. 3D is placed two times and overlapping, Fig. 6C labeling needs adjustments, labeling of Fig. 6D should be similar to Fig. 6A: CTX blue and BMH21 green. 

      We apologize for these errors and made corrections. Color in Fig.1E represents the EMT status of tumor cells as indicated in the revised figure, red for more epithelial, and green for more mesenchymal features. Fig S5 is now Fig S6, and referred in the revised manuscript. Legend for figures were corrected. Labels of Fig 6 were adjusted. 

      Reviewer #2 (Public Review): 

      (1) The current manuscript by Ban et al describes that cells undergoing EMT have increased rRNA synthesis, as analyzed by RNA seq-based gene expression analysis, and that the increased rRNA synthesis provides a therapeutic opportunity to target chemoresistance. The cells utilized in this manuscript were isolated from the authors' Tri-PyMT EMT lineage tracing model published a few years ago which demonstrated that cells undergoing EMT are not the cells that are contributing to metastasis but rather to tumor chemoresistance (Fischer, Nature 2015). This in vivo model has since then been criticized for not capturing all relevant EMT events which the authors also acknowledge in the introduction. The authors therefore reason that they use this lineage tracing model to better understand the role of EMT in chemoresistance. 

      A major problem with the current manuscript is that the authors present many of their findings as a novel without the proper acknowledgment of previously published literature in particular, Prakash et al., Nature Communications, 2019 and Dermitt, Dev Cell, 2020. In the studies by Prakash, the authors demonstrate that maintaining ongoing rRNA biogenesis is essential for the execution of the EMT program, and thus the ability of cancer cells to become migratory and invasive. Further, Prakash et al showed that blocking rRNA biogenesis with a small molecule inhibitor, CX-5461 (which is also used in the study by Ban et al) specifically inhibits breast cancer growth, invasion, EMT, and metastasis in animal models without significant toxicity to normal tissues. As such a significant revision that is necessary at this time is a rewrite of the manuscript especially the introduction and the discussion to more accurately describe and cite previously published findings and then highlight the current work by Ban et al which nicely builds on the previously published literature as it highlights the contribution of EMT to chemoresistance rather than metastasis. The suggestion for the authors is that they therefore should focus on highlighting the chemotherapy resistance angle as their Tri-PyMT EMT lineage tracing was chosen to test this angle and as such focus on both primary tumor growth and metastasis. 

      We appreciate the reviewer’s insightful feedback. In response, we have revised a section in the discussion to better highlight how our study builds upon and extends the work of others. We acknowledge that the link between ribosome biogenesis (RiBi) and the epithelial-mesenchymal transition (EMT) pathway was noted by prior researches (Prakash et al. 2019; Ebright et al. 2020). In the revised manuscript, we have included extra discussion about the topic. Our findings, however, contribute to this knowledge by elucidating increased activities of RiBi during both EMT and mesenchymal-epithelial transition (MET) processes, thereby deepening our understanding of its role. Additionally, we have clarified our novel stance on EMT-targeting strategies. Rather than solely targeting the mesenchymal phenotype, we propose that inhibiting the phenotypic switching ability of tumor cells (a round trip encompassing both EMT and MET) could be more effective, as described in the introduction part.

      Additional major revisions: 

      (2) The authors use the FSP1-Cre Model which in the field has been questioned as to not capture all the relevant EMT events and therefore their findings should be corroborated by another EMT model system. 

      We agree with the reviewer that the Fsp1-Cre model could not capture ALL the relevant EMT events. However, the fidelity and accuracy of Fsp1-Cre model in reporting EMT process of Tri-PyMT cells have also been demonstrated in our previous studies (Lourenco et al. 2020). Also, we have included additional results to further characterize this model: 1) Continuous fluorescence switching from RFP+ to GFP+ was observed in Tri-PyMT cells (Supplementary Fig S1); 2) Bulk RNA-seq data showed the differential expression of EMT marker genes with the RFP+ and GFP+ cells (Supplementary Fig S2A); 3) Single-cell RNA-seq data showed the EMT spectrum and EMT status distributions according to Fsp1(S100a4)/Epcam, and Vim/Krt18 expression (revised Supplementary Fig S3B, 3C). Hope these results clarify the reviewer’s doubt about the Fsp1-Cre model in reporting EMT of tumor cells. Of note, the evaluation of EMT status with RiBi activity does not rely solely on the fluorescent marker switch but on the ETM-related transcriptome (EMTome) of the Tri-PyMT cells. 

      Again, we agree with the reviewer that the Tri-PyMT model does not report ALL relevant EMT events. In the manuscript, we have included experiments with MD-MB231-LM2 cells (Fig 6D) and analyzed the sequencing databases of breast cancer patients (revised Supplementary Fig S13, S14), to validate the findings of the association between EMT status and RiBi activity.

      (3) In the current version of the manuscript, there are no measurements of rRNA synthesis, but the gene expression profiles are used as a proxy for rRNA synthesis. The authors therefore need to include measurements of rRNA synthesis corroborating the RNA sequencing data to support their scientific findings and claims. This can be accomplished by qPCR, Northern blot, or EU staining of the respective sorted cell population. Quantification of rRNA synthesis is also needed for the CX5461/BMH-21 and silencing studies. 

      We agree that direct measure rRNA synthesis is important to validate the association of RiBi activity with the EMT/MET process. Following the reviewer’s suggestion, we performed EU incorporation assay with RFP+, Double+, and GFP+ Tri-PyMT cells with and without RiBi inhibitors. Under the treatment-naïve condition, the double+ (EMT-transitioning) cells exhibited highest activity of rRNA synthesis compared to either RFP+ (E) and GFP+ (M) cells (revised Supplementary Fig S7). Also, as expected, the treatment of BMH21 or CX-5461 could significantly inhibit the rRNA synthesis (revised Supplementary Fig S8B).

      (4) Currently, there is no mechanistic insight as to how rRNA synthesis is increased during EMT, which would also strengthen the manuscript. This could be done through targeted ChIP analysis. 

      The experimental data in the current manuscript suggest that the activation of RiBi is upstream of the EMT process, as the impaired RiBi pathway hinders the EMT of tumor cells. We are uncertain about the suggestion regarding ChIP analysis. If the reviewer refers to ChIP analysis with EMT transcription factors (i.e., Snail, Twist, and Zeb1), it may not elucidate the mechanisms by which the EMT process is associated with rRNA synthesis. Using sorted GFP/RFP double-positive Tri-PyMT cells, we found enhanced activations in the ERK and mTOR pathways in the EMT-transitioning cells (Figure 3A). It is well-documented that the ERK and mTOR pathways are key coordinators of EMT (Xie et al., Neoplasia 2004; Shin et al., PNAS 2019; Lamouille et al., J. Cell Sci. 2012; Roshan et al., Biochimie 2019). Interestingly, we also observed significantly higher phosphorylation of rpS6, a downstream indicator of mTOR pathway activation, in the Doub+ cells. As an indispensable ribosome protein, rpS6 phosphorylation could impact ribosome functions of protein translation (Bohlen et al., Nucleic Acid Res. 2021; Mieulet et al., 2007).

      (5) rRNA synthesis has canonically been linked to the cell cycle therefore it will be necessary for the authors to determine the cell cycle state of their respective cell populations throughout the manuscript. 

      Following the reviewer's suggestion, we analyzed the cell cycles of RFP+, GFP+, and Doub+ Tri-PyMT cells. Our analysis revealed that the proportion of proliferating RFP+ cells (in the S phase) was higher than that of proliferating GFP+ cells. Interestingly, the Doub+ cells also exhibited a higher ratio of proliferation, which was significantly greater compared to both RFP+ and GFP+ cells (revised supplementary Figure S1B).

      (6) Statistics and quantifications are currently missing in several figures and need to be better explained throughout the manuscript to strengthen the scientific rigor of the studies. 

      We have improved the clarity of our manuscript. Proper statistics descriptions of experiments have been carefully reviewed and adequate information was edited in the revised manuscript.

      (7) Only metastasis studies are shown in the current version of the manuscript. These studies should be complemented with primary tumor studies as the main focus of the paper is the contribution of EMT to chemoresistance. 

      We appreciate the reviewer's suggestion regarding the primary tumor studies. We apologize for not stating clearly in our manuscript. In response, we have revised the manuscript to outline the rationale for establishing a competitive model by injecting a mixture of RFP+ and GFP+ cells in a 1:1 ratio via the tail vein. This model is designed to study of both EMT and MET processes under chemotherapy at a distal site, where tumor cells need phenotypic switches (both EMT and MET) to adapt to and overcome chemo/environmental challenges in this context. Indeed, we have studied the primary tumor growth with the pre-EMT (RFP+) and postEMT (GFP+) cells. Their differential contribution to tumor growth was published in another paper (Lourenco etal. Cancer Res 2020). 

      Reviewer #2 (Recommendations For The Authors): 

      Figure 1 and associated supplementary figure panels 

      Fig. 1A. More details are needed about the Tri-PyMT model and the induction of EMT in vitro. The authors mention that when growing the isolated cells they spontaneously undergo EMT when grown in 10% FBS. What is the timeline for this transition and how reproducible is it? This information is not clear from Supp. 1. When were cells taken for analysis and also how long is plasticity maintained? According to Supp 1. cell generation 15-21 seems to have a stable cell population of green, red, and yellow cells. Are these cell populations changing if one stimulates the whole cell population with a pro-EMT stimulus? Since cell proliferation is linked to rRNA synthesis the authors also need to include markers of cell cycle for the individual cell population to identify which cell cycle state each sorted cell population is associated with. 

      We thank the reviewer for recommending further analysis of the cell cycle among RFP+, GFP+, and Doub+ cells. As illustrated in the revised Supplementary Figure 1B, an increased proportion of RFP+ cells was observed in the S phases in comparison to GFP+ cells. Conversely, Doub+ cells demonstrated a proliferation rate even higher than to that of RFP+ cells.

      Upon sorting, RFP+ cells were found to spontaneously undergo epithelial-mesenchymal transition (EMT) when cultured in 10% FBS media, thereby converting to GFP+. We quantified the GFP+ cell percentage within the total cell population, noting a consistent transition of a certain proportion of RFP+ cells to EMT, leading to an accumulation of GFP+ cells. This accumulation stabilizes as approximately 60-70% of the entire population become GFP+. Remarkably, re-sorting RFP+ cells from this balanced tumor cell population resulted in a similar fluorescent transition pattern as observed in the parental population. The mechanisms by which tumor cells regulate the EMT phenotypes across the entire population remain unclear. Nevertheless, the equilibrium between RFP+ and GFP+ cells may be attributed in part to the more rapid proliferation of RFP+ cells and the limited proportion of tumor cells undergoing EMT.

      We conducted repeated long-term cultures (up to 20 passages) of the Tri-PyMT cells, yielding consistent results. The fluorescence transition pattern in Tri-PyMT cells proved highly reliable. Further details regarding the Tri-PyMT cells have been incorporated into the Methods section.

      Fig. 1B. The loading control is not even and quantification is missing, in the text, it states Vimten instead of Vimentin. 

      The less loading with Doub+ cells was due to the limited number of EMT transitioning cells we could purify by flow sorting. Even though, the expression of both epithelial and mesenchymal markers in the Doub+ cells were clear. In the revised manuscript, we have quantified the Western blot results. We also apologize for the type errors and have corrected the spelling of "Vimentin."

      Fig. 1K. In this figure, the authors write: 'It is worth noting that with the 2-phase classifications (Epi or Mes), the elevated RiBi activity was associated with the transitioning cells still exhibiting overall epithelial phenotypes; RiBi activities diminished as cells completed their transition to the mesenchymal phase'. But in Fig. 1K, the Ribi activity is already at a peak during the epithelial state and starts declining already at the beginning of the transition, can the authors please explain this data a bit more? The finding that ribosome biogenesis diminishes once the cells have completed their transition was shown in Prakash et al, Fig. 1 J, I, and accordingly their scientific findings should be discussed in the context of published work. 

      We acknowledge the reviewer's concerns regarding the comparison of the timeline for EMT in our model with that in Prakash's study. In our model, EMT-transitioning cells are identified by their EMT marker genes and fluorescence expression. We enriched the EMT transitioning cells by sorting the Doub+ cells. Due to the RFP protein's half-life, cells remain RFP+ for 2-3 days after the reporter cassette has switched to GFP expression. In Prakash's study, the EMT transitioning phase was defines by the duration of TGF-β stimulation.

      In Figure 1K, cells are categorized based on their EMT pseudotime, calculated from their expression of EMT marker genes in the EMTome. Ribosome biogenesis (RiBi) activity is highest in cells transitioning between phase 1 (Red) and phase 2 (Green), with both phases displaying predominantly epithelial phenotypes (Figures 1C, 1D, and 1E). RiBi activity declines in cells in phases 4, 5, and 3, which exhibit a mesenchymal phenotype. We have expanded the discussion to include more details in comparison with Prakash's study in the revised manuscript.

      Supp Fig S4. The authors should provide a rationale for how and why the specific marker genes were selected to calculate the AUC values. 

      We have chosen the specific EMT marker genes based on their overall expression levels in Tri-PyMT cells, ensuring consistency with the reported associations of their expression patterns to epithelial or mesenchymal phenotypes in the literature. We provide a detailed rationale for the selection of these genes in the Method of revised manuscript (Page #7).

      Figure 2 and associated supplementary figure panel. In this figure, rRNA synthesis needs to be evaluated in the cells isolated from the lungs to corroborate the RNA sequencing findings. 

      Following the reviewer’s suggestion, we performed an RT-PCR of Ribi related genes including Bop1, Gemin4, Its1, Its2, Npm1, Rpl8, Rpl29, Rps9, Rps24, Rps28, Polr1a, Setd4, Utp6, and Xpo1. Consistent with the bulk and single cell RNA sequencing, relatively higher expression of Ribi related genes were detected in Doub+ cells compared to that of RFP+ and GFP+ cells (revised Supplementary Fig S5). 

      Fig 2C, as per figure Supp Fig S4 please explain the rationale for how and why the specific marker genes were selected. 

      The same marker genes used for the calculation of the EMT AUC value as in Fig. 1. These marker genes were selected because their overall expression levels are readily detectable in Tri-PyMT cells, their expression patterns are consistent with their epithelial or mesenchymal phenotypes, and the associations between expression of marker genes and phenotypes are in line with the previous reports in literature. Description of AUCell value quantification was included in the revised manuscript (Page #7).

      Fig. 2G. The high Ribi during the epithelial state is most likely due to the resumption of cell proliferation of these cells. The authors should check the cell cycle states of these different sets of cells. 

      We agree with the reviewer that higher Ribi activity could be related to the resumption of cell proliferation of mesenchymal tumor cells. To clarify this, we revisited the scRNAseq data, and project the S phase score to the scatter plot of Ribi activity/MET pseudotime. Indeed, cells in the far mesenchymal state show low S phase score, while the proliferating cells were mostly detected in the MET transitioning phase and epithelial phase (revised Supplementary Figure S6D).

      Suppl Fig. 5 Please correct the figure legends as there is no figure D. 

      We apologize for the mislabeling. We have corrected the figure legend accordingly.

      Figure 3. Please explain the rationale for stimulating cells with FBS for the selected time points. 

      Fig. 3A. The loading control is not even, and quantification is missing. In addition, the authors should explain why the different time points were chosen and why FBS was chosen as a stimulus. In addition, from which passage of cells were these cells? 

      The RFP+ Tri-PyMT cells underwent EMT and switched their expression of fluorescent marker to GFP+ when cultured with FBS. To investigate the response of cells at varying EMT statuses to an FBS-enriched environment, we isolated RFP+, Doub+, and GFP+ cells from the 4th and 5th passages of Tri-PyMT cells and probed downstream signaling pathways after FBS stimuli. The timeline for stimulation was informed by the innate activation profile of these phosphorylation-dependent signals, spanning from 10 minutes to 1 hour. We noted that ERK signaling activation in RFP+ cells occurred within minutes of FBS exposure and diminished within approximately one hour. This ERK signal was more pronounced and persisted longer in Doub+ cells. In contrast, GFP+ cells exhibited a more transient and lower ERK activation (see revised Fig 3A). To address concerns regarding potential uneven loading in our previous assays, we have now included the quantification of Western blots in the revised Fig 3A.

      How and why were ERK and mTORC1 pathways chosen for analysis downstream of increased rRNA synthesis? ERK and mTORC1 have mostly been investigated in the role of cell proliferation which is why the cell cycle status of these cell populations will be important to consider in the context of their findings. 

      The regulation of ribosome biogenesis (RiBi) is mediated by multiple pathways, including the myelocytomatosis oncogene (Myc), mammalian targets of rapamycin (mTOR), and noncoding RNAs, as detailed by Jiao et al. in Signal Transduction and Targeted Therapy (2023). There was no significant difference in Myc expression between tumor cells with epithelial and mesenchymal phenotypes. We thus investigated the activation of the mTOR pathway in sorted RFP+, Doub+, and GFP+ cells. Additionally, given the recognized role of the ERK/MAPK signaling pathway in regulating protein synthesis and cell proliferation, we also analyzed the activation of ERK signals. 

      In alignment with the reviewer's observation regarding the potential correlation between cell proliferation rate and RiBi activation, we further characterized the cell cycle distributions of RFP+, Doub+, and GFP+ cells. Notably, the Doub+ cells exhibited a higher ratio of cells in the proliferative state (including S and G2/M phases) compared to RFP+ and GFP+ cells. Also, higher percentage of S phase cells were detected in RFP+ cells than GFP+ cells (revised Supplementary Figure S1B).

      Figure 3 B, C, D. Please provide more information about which cells are analyzed in this figure. 

      We apologize for the previous ambiguity regarding the cells analyzed in these figures. To clarify, the figure legend has been revised to specify that Tri-PyMT cells from the 5th to 10th passages were the subjects of analysis for cell size and nascent protein synthesis, utilizing flow cytometry.

      Figure 3D. The selected images show enlarged nucleoli/ fibrillarin which is an indicator of increased rRNA synthesis however, the authors need to show an increase in rRNA transcripts by q-PCR or Northern blot and also show EU staining in these different cell states to support their claim. 

      We appreciate the reviewer's recommendation to further validate the enhanced ribosome biogenesis (RiBi) in Doub+ cells. In response, we conducted RT-PCR analysis of several RiBi-related genes (revised Supplementary Fig S5). Additionally, we carried out an EU incorporation assay to illustrate the rRNA transcription activity within these cells. The new results have been incorporated into the revised manuscript (Supplementary Fig S7).

      Figure 4 and associated supplementary. In this figure, the authors show that using small molecule Pol I assembly inhibitors (BMH-21 and CX-5461) reduces the expression of mesenchymal proteins. As mentioned in previous comments these results should be put in the context of published work by Prakash et al which demonstrate that upon CX-5461 and genetic silencing of Pol I EMT is hampered as demonstrated by gene expression profiles as well as functional assays. 

      We revised the description of our experiments with Pol I inhibitors in the revised manuscript by including the citation context (Prakash et al Nat Commun, 2019) as mentioned above.  

      Figure 4A. Please provide an explanation of how the doses of Pol I assembly inhibitors were determined and also the selected time points. The Pol I assembly inhibitors should have an effect within a few hours (Drygin, Cancer Research, 2011, Peltonen, Cancer Cell, 24). The authors also need to show that the BMH-21 and CX5461 at selected doses are indeed inhibiting rRNA synthesis in the selected cell populations. The data would also be strengthened by performing ChIP analysis demonstrating that indeed the Pol I complex is disassociated from the rDNA genes upon inhibition. 

      In addition, why are there only 2 reports and how were the statistics done? Were the data normalized to the total number of cells? The graph visually shows a difference in cell numbers. Are cells dying at this concentration? More controls must be included including markers for cell stress, p53, autophagy, and apoptosis. 

      The dose of Pol inhibitors was selected based on prior studies, as noted by the reviewer. Peltonen et al. demonstrated that BMH-21 inhibits growth across a wide spectrum of cancer cell lines, achieving a mean half-maximal inhibition of cell proliferation (GI50) at 160 nM (Peltonen K., et al. Cancer Cell. 2014). Consistently, in our experiments, the growth inhibitory effect of BMH-21 on Tri-PyMT cells fell within this range, at approximately 200 nM (Fig 5B, Supplementary Fig S10). 

      To address the reviewer's suggestion and verify that RiBi inhibitor effectively inhibits rRNA synthesis in our study, we conducted an EU incorporation assay. This assay revealed significant inhibition of rRNA synthesis by BMH-21 and CX5461 in Tri-PyMT cells (revised Supplementary Fig S8B). Furthermore, to enhance the robustness of our findings, we repeated the BMH-21 treatment on sorted RFP+ Tri-PyMT cells across three biological replicates, which yielded consistent results.

      Figure 4B. How many replicates were done for this experiment and please provide quantification as per previous comments on WB experiments. The authors should provide a rationale for why Snail and Vimentin were chosen for these studies. Also, the authors should provide a functional assay and demonstrate that cells are less migratory post-treatment and not only markers. 

      Western blots with sorted Tri-PyMT cells were performed twice. We have added the quantification of these blot in the revised manuscript. Snail and Vimentin were chosen as mesenchymal markers to indicate EMT phenotype switches as those were well-studied and commonly used mesenchymal markers of EMT. The association of fluorescent marker switch and

      EMT phenotype such as cell migration was well established in our previous study (Fischer et al., 2015, Lourenco et al., 2020). The morphology and migration property of GFP+ were well distinguished from RFP+ counterparts. Also, following reviewer’s suggestion, we performed migration assay with BMH21 treatment (revised Supplementary Fig 8C). Indeed, the treatment with BMH21 or CX5461 inhibited cell migration as expected.

      Supplementary figure 7. The authors need to provide a rationale as to why the two Rps were chosen to inhibit ribosome biogenesis. 

      The two Rps targets were chosen based on their differential expression in Doub+ cells compared with RFP+ and GFP+ cells. Also, we considered the overall expression level of these genes in Tri-PyMT cells. We have edited the according text in the revised manuscript.

      Figure S7B. In the images shown there does not appear to be a significant change in the number of nucleoli however the cells seem to be smaller. This should be explained. 

      We agree with the reviewer that the box plot does not clearly show the nucleoli differences between these cells. We present the data with a violin plot, which more clearly exhibit the result (revised Supplementary Fig S9B). It was also true that the sizes of the Rps knockdown cells were relatively smaller than control cells. This is consistent with the finding that the EMT transitioning cell size was bigger than the non-transitioning cells (Fig 3B)

      .

      Figure 5 and Supp 8. The authors should provide the background as to why the specific chemotherapeutic drugs were chosen. 

      The chemotherapeutic agents employed in this study are widely used in the treatment of breast cancer. For instance, Cyclophosphamide (CTX) hampers both DNA replication and RNA transcription; Doxorubicin inhibits DNA replication by disrupting topoisomerase activity; Paclitaxel prevents cell division by stabilizing microtubules; and 5-Fluorouracil (5-FU), a pyrimidine analog, blocks thymidylate synthase, thereby disrupting DNA synthesis. Additionally, some of these agents, such as CTX and 5-FU, may directly or indirectly affect RNA polymerase, prompting us to investigate the synergistic effects of these drugs when used in combination with BMH21. We have included the information in revised manuscript. 

      Fig 5B/Supp 8. Can the authors please explain why only 2 replicates were done and provide a rationale for future statistics? 

      Using serial concentrations of drugs tested—6 doses for BMH21 and 8 doses for CTX—it is logical to arrange the experiment in duplicates on 96-well plates. For the statistical analysis, we conducted dose-response analysis to ascertain the IC50 values for each drug alone and in combination. Additionally, we calculated the synergy score to assess the interactions between the drugs. The methodology section of the manuscript has been enhanced to provide a clearer description of these processes in the revised version.

      Figure 6. The authors should provide a rationale of why tail veins were chosen as their in vivo model system as the EMT cells do not cause metastasis and if chemoresistance is the main focus of their studies both primary and secondary tumors should be considered. Why was not the MMTVPyMT mouse model chosen where the cells were originally isolated from to test the role of the dual treatment? How was the drug concentration decided and the interval of treatments? 

      We acknowledge the reviewer's concerns regarding the choice of experimental setup for our metastasis model. Certainly, utilizing the original MMTV-PyMT mice for the combination therapy experiment would be the ideal scenario. However, there are potential drawbacks to using these transgenic mice: 1) The occurrence of multiple primary tumors that develop simultaneously but without synchronized timelines (in mice aged 6-9 weeks), and the unsynchronized development of lung metastasis (from 10-16 weeks of age). This leads to uncontrollable variations in the experimental setup, particularly when establishing multiple treatment groups; 2) Gathering a sufficient number of female transgenic mice of a similar age poses another challenge; 3) The absence of tumor cell labeling complicates the focus on assays for EMT/MET phenotype changes during tumor progression. Consequently, we have chosen to employ our Tri-PyMT model for this experiment. The drug treatment protocol was established after reviewing literature on the in vivo application of CTX and BMH21 treatment (Peltonen etal. Cancer Cell 2014; Jacobs etal. JBC 2022).

      Figure 6B, C. The authors should provide quantification for these data, how many mice were analyzed, and how many sections were stained and analyzed. 

      We have improved the quality of these fluorescent images and clarify the methodology, including the mouse/section numbers per group, for obtaining these fluorescent images in the legend. To quantify the differential impact of BMH21 on RFP+ and GFP+ tumor cells, we performed flow cytometry (revised Supplementary Fig S11). We have also changed the presentation of these flow data to improve the clarity of these results. 

      Fig 6D. How were the treatment timeline and dosing chosen? LM2 cells are derived from a metastatic site, so they are not transitioning cells they are stably mesenchymal why was this chosen as their in vivo model? 

      LM2 cells were derived from the lung metastasis of MDA-MB-231 cell line. These cells exhibit predominantly mesenchymal phenotype in culture. While growing into metastasis in the lung, expressions of epithelial markers such as E-cad were upregulated (Supplementary Fig S12), suggesting a MET process may be involved the outgrowth of lung metastasis. Therefore, we choose the LM2 cells as our experimental model for assessing the effect of RiBi inhibitor on MET. The treatment timeline was determined based on previous studies of BMH21 and chemotherapy applications in vivo (Peltonen etal. Cancer Cell 2014; Jacobs etal. JBC 2022).  

      Reviewer #3 (Public Review): 

      Summary: 

      Ban et al. investigated the role of ribosome biogenesis (RiBi) in epithelial-to-mesenchymal transition (EMT) and its contribution to chemoresistance in breast cancer. They used a Tri-PyMT EMT lineage-tracing model and scRNA-seq to analyze EMT status and found that RiBi was elevated during both EMT and mesenchymal-to-epithelial transition (MET) of cancer cells. They further revealed that nascent protein synthesis mediated by ERK and mTOR signaling pathways was essential for the completion of RiBi. Inhibiting excessive RiBi impaired EMT and MET capability. More importantly, combinatorial treatment with RiBi inhibitors and chemotherapy drugs reduced metastatic outgrowth of both epithelial and mesenchymal tumor cells. These results suggest that targeting the RiBi pathway may be an effective strategy for treating advanced breast cancer with EMT-related chemoresistance. 

      Strengths: 

      The conclusions of this study are generally supported by the data. However, some weaknesses still exist as mentioned below. 

      Weaknesses: 

      (1) The study predominantly focused on RiBi as a target for overcoming EMT-related chemoresistance. Thus, it will be necessary to provide some canonical outcomes after upregulating ribosome biogenesis, such as translation activity. I would suggest ribosome profiling or puromycin-incorporation assay, or other more suitable experiments. 

      EU incorporation assay (revised Supplementary Fig S7) and puromycin incorporation assay (Fig 3C) were performed.

      (2) The results were basically obtained from mice and in vitro experiments. While these results provide valuable insights, it will be valuable to validate part of the findings using some tissue samples from patients (e.g. RiBi activity) to determine the clinical relevance and potential therapeutic applications.  

      We agree. We have added the analyses on the correlation between patients’ survival and RiBi activation (revised Supplementary Fig S13, S14).

      (3) The results revealed that mTORC1 and ERK mediated RiBi activation. How about mTORC2? It will be informative to evaluate mTORC2 signaling. 

      We investigated the role of the mTORC1 pathway in regulating RiBi activation. It is pertinent to acknowledge that the mTORC1 complex is known to positively regulate protein synthesis through the phosphorylation of ribosomal protein S6 kinase, among other mechanisms. Additionally, Rps6 is recognized as an essential component of the 40S subunit in the ribosome. We agree with the reviewer that mTORC2 may also be involved in RiBi activity, as its activation is mediated through ribosome association (Zinzalla et al., Cell 2011; Prakash et al., Nat Comm 2019). However, this association is more likely to be downstream of RiBi activation, as the RiBi inhibitor CX5461 can block the translocation of Rictor into the nucleus (Prakash et al., Nat Comm 2019).

      We also revisited our sequencing data of RFP+, GFP+, and Doub+ cells. While there was no significant change in the expression of either Rptor or Rictor among these cells, the LSMean (overall expression level) of Rptor was higher than that of Rictor; for example, 163.77 vs 29.95 in RFP+ cells. This suggests that mTORC1 may play a dominant role in regulating RiBi activity in our model.

      Furthermore, we analyzed how Rapamycin (an mTORC1 inhibitor) affects the EMT process in TriPyMT cells. As expected, Rapamycin-treated cells exhibited higher expression of the epithelial marker E-cadherin (Ecad) and lower expression of the mesenchymal markers Snail and Vimentin (Vim) compared to the control (For Reviewers Figure 3).

      (4) The results also demonstrated promising synergic effects of Pol I inhibitor (BMH21) and chemotherapy drug (CTX) on chemo-resistant metastasis. How about using the inhibitors of mTORC1 together with CTX? 

      Several mTOR inhibitors (e.g., sirolimus, temsirolimus, ridaforolimus) have demonstrated antitumor activity. The combination of mTOR inhibitors with various targeted therapies or chemotherapies is being examined in numerous clinical trials, showing promising results. Although the combination therapy of mTORC inhibitors and CTX is beyond the scope of our study, we analyzed how mTOR inhibitors may affect the EMT process in our model, as mentioned above. Western blot analysis of EMT markers (E-cadherin, Snail, and Vimentin) showed that rapamycin treatment inhibited the EMT transition of Tri-PyMT cells. (For Reviewers Figure 3).

      (5) While the results demonstrate the potential efficacy of RiBi inhibitors in reducing metastatic outgrowth, other factors and mechanisms contributing to chemoresistance may exist and need further investigation. I would suggest some discussion about this aspect. 

      Following reviewer’s suggestion, we have edited the discussion section with more future directions. 

      Reviewer #3 (Recommendations For The Authors): 

      (1) Please provide the quantified data for all western blots, rather than solely show some representative blots. 

      We quantified the western blot images as shown in the revised figures. Thanks for reviewer’s suggestion.  

      (2) Please add a graphic abstract or schematic to help the readers understand the whole story. 

      We have summarized a schematic graph of our findings in the revised manuscript (Supplementary Fig S15).

      (3) It is hard to read the numbers inside all plots of flow cytometry. 

      High-resolution figures of flow plots are included in the revised manuscript.

      (4) Please provide high-resolution figures for all the synergy plots.

      High-resolution figures of synergy plots are included in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Wang et al. demonstrate that knockdown of DYRK1A results in reduced cell size, which is mediated by mTORC1 activity. They found that DYRK1A interacts with TSC1/TSC2 proteins which leads to the phosphorylation of TSC2 at T1462. Phosphorylation of TSC2 at T1462 inhibits TSC2 activity leading to the activation of mTORC1. The authors complement their findings by demonstrating that overexpression of RHEB (positive regulator of mTORC1) rescues the phenotype of DYRK1A (mnb in flies) mutation in the NMJ.

      The authors' findings on the regulation of cell size and mTORC1 activity by DYRK1A reflect the previous findings of Levy et al. (PMID: 33840455) that cortical deletion of Dyrk1a in mice causes decreased neuronal size associated with a decreased activity of mTORC1 that can be rescued by the inhibition of Pten or supplementation of IGF1.

      The authors demonstrate that T1462 phospho-site at TSC2 is phosphorylated in response to the overexpression of WT but not kinase-dead DYRK1A. However, the authors do not provide any evidence that the regulation of mTORC1 is mediated via phosphorylation of this site. In addition, T1462 site is known to be phosphorylated by Akt. There is a possibility that Akt was co-purified with TSC1/TSC2 complex and DYRK1A promotes phosphorylation of TSC2 indirectly via the activation of AKT that can be tested by using AKT depleted cells.

      We thank the reviewer for reviewing this manuscript and the critical comments. Various groups have reported the significance of the Phosphorylation of TSC2 T1462, along with four other phosphorylation sites, in regulating mTORC1, and therefore, we did not deal with this in the current manuscript (Manning et al. PMID: 12150915, Inoki et al. PMID: 12172553, Zhang et al. PMID: 19593385). Regarding co-purification of AKT with TSC1/TSC2 - AKT phosphorylates T1462, S939 and S1387 (Manning et al. PMID: 12150915, Inoki et al. PMID: 12172553, Zhang et al. PMID: 19593385). However, in in vitro kinase assay, signal intensities of anti-TSC2 S939 and S1387, with or without ATP, showed no significant difference, suggesting that AKT is not pulled down with TSC1 or TSC2. DYRK1A and Kinase dead DYRK1A were expressed and purified from bacteria.  Moreover, multiple studies have purified TSC1 and TSC2 and reported no AKT co-purified (Menon et al. PMID: 24529379, Chong-kopera et al. PMID: 16464865).

      RHEB is the most proximal regulator of mTORC1 and can activate mTORC1 even under amino acid starvation. The fact that RHEB overexpression rescues the cell size under DYRK1A depletion or mnb (DYRK1A in Drosophila) mutant phenotype does not prove that DYRK1A regulates the cell size via TSC1 as it would rescue any inhibitory effects upstream to mTORC1.

      We agree with the reviewer that overexpression of RHEB may rescue any inhibitory effects upstream to mTORC1.  In the results and discussion sections (Page number 7, last 3 lines), we mentioned that Rheb overexpression only supports our suggestion that DYRK1A likely works upstream to RHEB. We, however, have performed another experiment to strengthen our hypothesis. We show that increased cell size phenotype due to DYRK1A overexpression can be suppressed by inhibiting the TORC1 pathway, suggesting that mTORC1 is necessary for DYRK1A-mediated cell growth.  These results are presented in Supplementary Figure 4. The results of two reciprocals of experiments (Suppression of DRYK1A/Mnb loss of function phenotypes by RHEB overexpression and suppression of rescue of DYRK1A Gain of function phenotypes) along with and regulation of TSC phosphorylation by DYRK1A strongly suggests that DYRK1A positively regulates TSC pathway.

      Reviewer #2 (Public Review):

      This study aims to describe a physical interaction between the kinase DYRK1A and the Tuberous Sclerosis Complex proteins (TSC1, TSC2, TBC1D7). Furthermore, this study aims to demonstrate that DYRK1A, upon interaction with the TSC proteins regulates mTORC1 activity and cell size. Additionally, this study identifies T1462 on TSC2 as a phosphorylation target of DYRK1A. Finally, the authors demonstrate the role of DYRK1A on cell size using human, mouse, and Drosophila cells.

      This study, as it stands, requires further experimentation to support the conclusions on the role of DYRK1A on TSC interaction and subsequently on mTORC1 regulation. Weaknesses include, 1) The lack of an additional assessment of cell growth/size (eg. protein content, proliferation), 2) the limited data on the requirement of DYRK1A for TSC complex stability and function, and 3) the limited perturbations on the mTORC1 pathway upon DYRK1A deletion/overexpression.

      We thank the reviewer for reviewing this manuscript and the comments. We have previously analyzed the effect of DYRK1A knockdown in the proliferation of THP cells (human leukemia monocytic cell line) (Li Shanshan et al. PMID: 30137413) and have shown that DYRK1A knockdown negatively affects cell proliferation. Other studies have also shown a role for DYRK1A in cell proliferation, including in foreskin fibroblasts (Chen et al. PMID: 24119401) and HepG2 cells (Frendo-Cumbo et al. PMID: 36248734). mTORC1 regulates several pathways, including protein synthesis, lipid synthesis, nucleotide synthesis, autophagy, and stress responses. We have not done the protein content as this parameter is directly affected by TORC1 activation and may not be a suitable measure for cell growth. A large number of studies involving mTORC1 regulation analyze the levels of S6K and S6 phosphorylation, as these are direct readouts of mTORC1 function   (Prentzell et al. PMID: 33497611,  Zhang et al. PMID: 17052453, Ben-Sahra et al, PMID: 23429703, Düvel et al. PMID: 20670887,  Zhang et al. PMID: 2504303). Therefore, we used these markers to assess the status of the mTORC1 pathway.

      (2) ..the limited data on the requirement of DYRK1A for TSC complex stability and function,

      We agree with this limitation in our study. We have not seen a significant difference in TSC1 or TSC2 protein levels in DYRK1A knockdown or overexpressing cells, so we did not follow up on this aspect.

      ..and 3) the limited perturbations on the mTORC1 pathway upon DYRK1A deletion /overexpression.

      We have performed an additional experiment where we overexpressed DYRK1A and showed that increased cell size phenotype due to DYRK1A overexpression can be suppressed by inhibiting the TORC1 pathway, suggesting that mTORC1 is necessary for DYRK1A-mediated cell growth.  These results are presented in Supplementary Figure 4. The results of two reciprocals of experiments (Suppression of DRYK1A/Mnb loss of function phenotypes by RHEB overexpression and suppression of Rescue of DYRK1A Gain of function phenotypes) along with and regulation of TSC phosphorylation by DYRK1A suggests that DYRK1A positively regulates TSC pathway.

      Finally, this study would benefit from identifying under which nutrient conditions DYRK1A interacts with the TS complex to regulate mTORC1. The interaction described here is highly impactful to the field of mTORC1-regulated cell growth and uncovers a previously unrecognized TSC-associated interacting protein. Further characterization of the role that DYRK1A plays in regulating mTORC1 activation and the upstream signals that stimulate this interaction will be extremely important for multiple diseases that exhibit mTORC1 hyper-activation.

      We agree that identifying nutrients (or physiological conditions) that affect DYRK1A-mediated TSC regulation will be important to understanding the additional complexity in context-dependent mTORC1 activation/deactivation. This study has not addressed those issues, particularly due to DYRK1A's pleiotropic nature. DYRK1A has many substrates, and both overexpression and loss of DYRK1A lead to multiple phenotypes. Identifying nutrient conditions or growth factors that can regulate the activation of DYRK1A is not yet known and would require an independent investigation.

      Reviewer #3 (Public Review):

      The manuscript describes a combination of in vitro and in vivo results implicating Dyrk1a in the regulation of mTORC. Particular strengths of the data are this combination of cell and whole animal (drosophila) based studies. However, most of the experiments seem to lack a key additional experimental condition that could increase confidence in the authors' conclusions. Overall some tantalizing data is presented. However, there are several issues that should be clarified or otherwise addressed with additional data.

      We thank the reviewer for reviewing and commenting on this manuscript.

      (1) In Figure 1G, why not test overexpression levels of Dyrk1a via western rather than only looking at the RNA levels?

      Induced overexpression of DYRK1A was probed by analyzing mRNA levels, as the concentration of Doxycycline used (0-100 ng/ml) did not produce enough protein that could be detected by anti-flag antibody in a western blot. We have modified the sentence (page 5, paragraph 1).

      (2) In Figure 2, while there is clearly TSC1 protein in the Dyrk1a and FLAG-Dyrk1a IPs that supports an interaction between the proteins, it would be good to see the reciprocal IP experiment wherein TSC1 or TSC2 are pulled down and then the blot probed for Dyrk1a.

      In the revised manuscript, we have provided evidence that TSC1 and TSC2 can interact with endogenous DYRK1A. We have performed immunoprecipitation of affinity-tagged TSC1 or TSC2 and have probed for the enrichment of DYRK1A (Supplementary Figure S2).

      (3) Figures 3 A and D tested the effects of Dyrk1a knockdown using different methods in different cell lines. This is a reasonable approach to ascertain the generalizability of findings. However, each experiment is performed differently. For example, in 3A, the authors found no difference in baseline pS6, so they did a time course of treatment to induce phosphorylation and found differences depending on Dyrk1a expression. In 3D, they only show baseline effects from the CRISPR knockdown. Why not do the time course as well for consistency? Also, why the an inconsistency in approaches wherein one shows baseline effects and the other does not? The authors could also consider the pharmacologic inhibition of Dyrk1a activity as well.

      We agree that different methods were used in different cell lines to assess the effect of DYRK1A. Since DYRK1A is a pleiotropic gene, its manipulation has diverse effects on different cell lines. Also, not all cell types have similar levels of mTORC activity. Hence, we had to adapt to different strategies in different cell types, which accounted for the inconsistency in the methodology.  However, various groups have used these methods to determine the activity of mTORC1 by S6 and S6K phosphorylation by both starvations, followed by the stimulation and direct estimation methods in cycling cells (Prentzell et al. PMID: 33497611,  Zhang et al. PMID: 17052453, Ben-Sahra et al, PMID: 23429703, Düvel et al. PMID: 20670887,  Zhang et al. PMID: 25043031). ShRNA-mediated knockdown in HEK293 cells does not change S6 or S6K phosphorylation levels in actively growing cells, whereas cycling NIH3T3 cells shows a significant reduction in S6 and S6K phosphorylation. As suggested, we used pharmacological inhibition of DYRK1A and 1uM Harmine to treat the HEK293 cells and perform starvation. However, cells treated and starved start to float and die in large numbers. Thus, we did not follow this experiment further.

      (4) In Figure 4, RHEB overexpression increases cell size in both Dyrk1a wt and Dyrk1a shRNA treated cells, although the magnitude of the effect appears reduced in Dyrk1a shRNA cells. However, there is the possibility here that RHEB acts independently of Dyrk1a. Why not also do the experiment of Figure 1 wherein Dyrk1a is overexpressed and then knockdown RHEB in that context? If the hypothesis is supported, then RHEB knockdown should eliminate the cell size effect of Dyrk1a overexpression.

      We thank the reviewer for suggesting this experiment.  We have overexpressed DYRK1A using the inducible HEK293A-Flag-DYRK1A overexpression system and treated cells with mTOR inhibitors (Rapamycin or Torin1). The results are added to the supplementary figure S4. Our results show that the increased cell size phenotype due to DYRK1A overexpression can be suppressed by inhibiting the TORC1 pathway. This suggests that mTORC1 is necessary for DYRK1A-mediated cell growth. This data further supports the hypothesis that DYRK1A is a positive regulator of the mTORC1 pathway.

      (5) The discussion should incorporate relevant findings from other models, such as Arabidopsis. Barrada et al., Development (2019), 146 (3).

      We have incorporated the findings from Arabidopsis (Barrada et al., Development (2019), 146 (3) PMID: 30705074) in the last paragraph of the discussion section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To demonstrate that DYRK1A can phosphorylate T1462 phospho-site at TSC2 in the absence of Akt using genetic and pharmacological approaches (by using pan-Akt small molecule inhibitors).

      We have performed in vitro kinase assay using recombinant DYRK1A, and affinity purified TSC1/TSC2 from HEK293 cells. However, we have not been able to perform this experiment by overexpression of DYRK1A in human cells, as 1) strong overexpression of DYRK1A leads to cell cycle exit, as demonstrated by various laboratories (Soppa et al. PMID: 24806449, Hämmerle et al PMID: 21610031,  Najas et al. PMID: 26137553, Park et al. PMID: 20696760) and our observations, and 2) T1462 Antibody signal is weak and cannot be seen in cellular extracts. We have attempted this experiment with at least three different batches of T1462 antibody from CST without success.

      (2) To demonstrate that endogenous phosho-mutant/mimetic substitution of T1462 phospho-site at TSC2 is sufficient to prevent the regulation of cell size/NMJ phenotype in Drosophila by DYRK1A (mnb).

      This is an interesting experiment, and we thank the reviewer for this suggestion. However, we are skeptical about interpreting the possible results. Since T1462 substitution will also block the regulation by other kinases, e.g., Akt, and it may constitutively suppress the mTORC1, any interpretation will be confusing.

      Reviewer #2 (Recommendations For The Authors):

      (1) In section 2.1 the authors claim that DYRK1A down-regulation enhances cell growth. An additional assessment of cell growth or size would strengthen this statement. Is total protein content also increased upon DYRK1A overexpression? Does DYRK1A KD also increase cell proliferation? In Figure 1, providing the median or mean size of cells in each condition will help the reader understand the impact of DYRK1A on cell size. In Supplementary Figure 1, the important statistical differences should be highlighted.

      We have not claimed that down-regulation of DYRK1A enhances cell growth. We have not tested the protein content in a cell directly. Knockdown of DYRK1A leads to a reduction in cell proliferation, as shown by various groups, including ours (Shanshan Li PMID: 30137413, Luna et al. PMID: 30343272). Cell size is a very dynamic process and is variable within the population. All the studies measuring cell size show the size using assays on a population of cells. We have not been able to figure out a way to display the median or mean cell size that accurately reflects the cell size of the whole population. 

      (2) In section 2.2 the authors describe the interaction between DYRK1A and the TSC proteins. Do the DYRK1A mutants impact interaction with TSC2 and TBC1D7 or is this specific to TSC1?

      We have not tested this possibility.

      (3) In section 2.3, more detailed perturbations of the mTORC1 pathway are needed. Is the mTORC1 activation observed sensitive to rapamycin treatment? Since mTORC1 regulates cell size via S6 ribosomal protein and transcription via 4EBP1, phosphorylation of 4EBP1 should also be considered. In Figure 3A, what is the level of DYRK1A down-regulation? It is unclear how many shRNA constructs were used or whether these were pooled constructs or single clones. If one shRNA/sgRNA is used, it would be very helpful to validate some of the key findings of this study with at least one more clone.

      Many research studies have measured the activity of various mTORC1 substrates, the most commonly used being the phosphorylation of S6 and S6K. We agree that analyzing 4EBP1 would make the study more comprehensive, but to complete the study with our limited resources and in a limited time, we have not attempted to establish the 4EBP1 phosphorylation status. We have used a previously described and validated DYRK1A shRNA (as mentioned in the methods section).

      (4) In section 2.3 is T1462 an activating or inhibiting phosphorylation event? If DYRK1A phosphorylates and activates mTORC1 via RHEB, shouldn't that result in the inhibition of mTORC1?

      Multiple laboratories have demonstrated that T1462 phosphorylation leads to a reduced TSC complex activity and, hence, increased mTORC1 activity (Manning et al. PMID: 12150915, Inoki, PMID: 12172553, Zhang PMID: 19593385).

      (5) In section 2.4, what is the status of AKT phosphorylation? Would an AKT inhibitor be useful in this scenario?

      AKT phosphorylates T1462, S939 and S1360, as demonstrated by others. However, in our in vitro assay kinase assay, the following facts suggest that AKT is not involved in T1462 phosphorylation we observed:

      (1) Signal intensities of anti-TSC2 S939 and S1387 with or without ATP, do not show any significant differences, suggesting that AKT is not pulled down with TSC1 or TSC2.

      (2) Multiple studies have performed phosphorylation studies of TSC1 and TSC2 and have not reported any co-purification of AKT.

      (6) Very minor grammar errors were observed, mostly at the beginning of the manuscript.

      We tried our best to fix grammatical errors.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Yang et al. conduct a comprehensive investigation to demonstrate the role of adipose tissue Mir802 in obesity-associated inflammation and metabolic dysfunction. Using multiple models and techniques, they propose a mechanism where elevated levels of Mir802 in adipose tissue (both in mouse models and humans) trigger fat accumulation and inflammation, leading to increased adiposity and insulin resistance. They suggest that increased Mir802 levels in adipocytes during obesity result in the downregulation of TRAF3, a negative regulator of canonical and non-canonical NF-κB pathways. This downregulation induces inflammation through the production of cytokines/chemokines that attract and polarize macrophages. Concurrently, the NF-κB pathway induces the lipogenic transcriptional factor SREBP1, which promotes fat accumulation and further recruits pro-inflammatory macrophages. While the proposed model is supported by multiple experiments and consistent data, there are areas where the manuscript could be improved. Some improvements can be addressed in the text, while others require additional controls, experiments, or analyses.

      1) The manuscript should provide measurements of lipid droplet/adipocyte size for all models, both in vitro and in vivo. In vivo studies should also include fat weight measurements. This is crucial to determine whether Mir802, TRAF3, and SREBP1 promote adiposity/fat accumulation across all models.

      Thank you for your careful reviewing. As suggested, we have measured the size of lipid droplet and adipocyte (1J, 2A, S2I, 3F, 3L, S3L, 5I), this modification can make you and other readers understand our manuscript more clearly. In vivo studies have included fat weight measurements (Figure 2K, L; Figure 3C, D; Figure 5N). Our results determined that adipose-selective overexpression Mir802 induced adipogenesis during high fat diet induced.

      2) The rationale for co-culture experiments using WAT SVF is unclear, given that Mir802 is upregulated by obesity in adipocytes, not in the stromal-vascular fraction. These experiments would be more relevant if performed using isolated adipocytes or differentiated WAT SVF.

      Thank you for this important point. We are sorry for our inaccurate expression. In our study, we used differentiated WAT SVF to co-culture with primary macrophage, we illustrated it in the methods of Migration and invasion assays. We have revised it in the Flowchart of the co-culture experiments (Figure 4A). We hope that this modification will enhance readers' comprehension of our manuscript.

      3) Figures 1G and 1H lack a control group (time 0 or NCD). Without this control, it is impossible to determine if inflammation precedes Mir802 upregulation.

      Thank you for this insightful comment. In the previous study, we have tested the 0 weeks high fed diet treatment group of the Figures 1I and 1J, now we have added this data in the manuscript, we hope this modification can enhance our conclusion that inflammation precedes Mir802 upregulation.

      4) The statement, "The knockout of Mir802 in adipose tissue did not alter food intake, body weight, glucose level, and adiposity (data not shown)," needs more detail regarding the age and sex of the animals. These data are important and should be reported, perhaps in a supplementary figure.

      Thank you for your careful reviewing. To enhance our conclusions, we have added the data of food intake, body weight, glucose level, and adiposity about Mir802 KO mice treated with normal chow diet (NCD, Supplementary Figure 3E-I).

      ….The knockout of Mir802 in adipose tissue did not alter food intake, body weight, glucose levels, and adiposity compared with their WT littermates in both males and females when they were fed with NCD (Figure S3E-I)……

      5) The terms "KO" (knockout) and "KI" (knock-in) are misleading for AAV models, as they do not modify the genome. "KD" (knockdown) and "OE" (overexpression) are more accurate.

      Thank you for your good advice. We are sorry for our inaccurate expression. According to your advice, we have rewritten it. AAV models for Mir802 knockdown (Figure 3) and Traf3 overexpression (Figure 5) have changed to KD and OE respectively.

      6) The statement, "Mir802 expression was unaffected in other organs (Figure S3O)," should clarify that this is except for BAT.

      We appreciate the you for this insightful comment. We have clarified that Mir802 expression was unaffected in other organs except for BAT (Figure S3T, revised manuscript).

      By addressing these points, the manuscript would present a more robust and clear demonstration of the role of Mir802 in obesity-associated inflammation and metabolic dysfunction.

      Thanks for your positive comments. As suggested, we have modified all point.

      Reviewer #2 (Public Review):

      Yang et al. investigated the role of Mir802 in the development of adipose tissue (AT) inflammation during obesity. The authors found Mir802 levels are up-regulated in the AT of mouse models of obesity and insulin resistance as well as in the AT of humans. They further demonstrated that Mir802 regulates the intracellular levels of TRAF3 and downstream activation of the NF-kB pathway. Ultimately, controlling AT inflammation by manipulating Mir802 affected whole-body glucose homeostasis, highlighting the role of AT inflammatory status in whole-body metabolism. The study provides solid evidence on the role of adipocyte Mir802 in controlling inflammation and macrophage recruitment. However, how lipid mobilization from adipocytes and how engulfment of lipid droplets by macrophages control inflammatory phenotype in these cells could be better explored. The findings of this study will have a great impact in the field, contributing to the growing body of evidence on how microRNAs control the inflammatory microenvironment of AT and whole-body metabolism in obesity.

      Thanks for your positive comments.

      Reviewer #3 (Public Review):

      Mir802 appears to accumulate before macrophage numbers increase in adipose tissue in both mice and humans. The phenotype of Mir802 overexpression and deletion in vivo is sticking and novel. Deletion of Mir802 in adipose tissue after obesity onset also attenuated Adipose inflammation and improved systemic glucose homeostasis. Understanding how Mir802 affects the crosstalk between macrophage and adipocyte is a major point. For example, does Mir802 change the inflammatory of macrophages as it increases Traf3 expression in adipocytes? This is important because macrophages are the input if inflammatory mediators that will activate the TNFR receptor signaling pathway, potentially Traf3, resulting in impaired insulin stimulated Glut4 translocation and glucose uptake. Also, modulation of Mir802 levels in vivo leads to alterations in adiposity. Here, what is a direct effect of Mir802 and what is a result of simply reduced adiposity? One point that os ket is what triggers Mir802 expression, especially in obesity.

      Thanks for your important suggestions. According to your suggestions, we have addressed additional data in the revised manuscript to enhance our conclusion.

    1. Author response:

      Reviewer #1 (Public Review):

      In this paper, Tompary & Davachi present work looking at how memories become integrated over time in the brain, and relating those mechanisms to responses on a priming task as a behavioral measure of memory linkage. They find that remotely but not recently formed memories are behaviorally linked and that this is associated with a change in the neural representation in mPFC. They also find that the same behavioral outcomes are associated with the increased coupling of the posterior hippocampus with category-sensitive parts of the neocortex (LOC) during a post-learning rest period-again only for remotely learned information. There was also correspondence in rest connectivity (posterior hippocampus-LOC) and representational change (mPFC) such that for remote memories specifically, the initial post-learning connectivity enhancement during rest related to longer-term mPFC representational change.

      This work has many strengths. The topic of this paper is very interesting, and the data provide a really nice package in terms of providing a mechanistic account of how memories become integrated over a delay. The paper is also exceptionally well-written and a pleasure to read. There are two studies, including one large behavioral study, and the findings replicate in the smaller fMRI sample. I do however have two fairly substantive concerns about the analytic approach, where more data will be required before we can know whether the interpretations are an appropriate reflection of the findings. These and other concerns are described below.

      Thank you for the positive comments! We are proud of this work, and we feel that the paper is greatly strengthened by the revisions we made in response to your feedback. Please see below for specific changes that we’ve made.

      1) One major concern relates to the lack of a pre-encoding baseline scan prior to recent learning.

      a) First, I think it would be helpful if the authors could clarify why there was no pre-learning rest scan dedicated to the recent condition. Was this simply a feasibility consideration, or were there theoretical reasons why this would be less "clean"? Including this information in the paper would be helpful for context. Apologies if I missed this detail in the paper.

      This is a great point and something that we struggled with when developing this experiment. We considered several factors when deciding whether to include a pre-learning baseline on day two. First, the day 2 scan session was longer than that of day 1 because it included the recognition priming and explicit memory tasks, and the addition of a baseline scan would have made the length of the session longer than a typical scan session – about 2 hours in the scanner in total – and we were concerned that participant engagement would be difficult to sustain across a longer session. Second, we anticipated that the pre-learning scan would not have been a ‘clean’ measure of baseline processing, but rather would include signal related to post-learning processing of the day 1 sequences, as multi-variate reactivation of learned stimuli have been observed in rest scans collected 24-hours after learning (Schlichting & Preston, 2014). We have added these considerations to the Discussion (page 39, lines 1047-1070).

      b) Second, I was hoping the authors could speak to what they think is reflected in the post-encoding "recent" scan. Is it possible that these data could also reflect the processing of the remote memories? I think, though am not positive, that the authors may be alluding to this in the penultimate paragraph of the discussion (p. 33) when noting the LOC-mPFC connectivity findings. Could there be the reinstatement of the old memories due to being back in the same experimental context and so forth? I wonder the extent to which the authors think the data from this scan can be reflected as strictly reflecting recent memories, particularly given it is relative to the pre-encoding baseline from before the remote memories, as well (and therefore in theory could reflect both the remote + recent). (I should also acknowledge that, if it is the case that the authors think there might be some remote memory processing during the recent learning session in general, a pre-learning rest scan might not have been "clean" either, in that it could have reflected some processing of the remote memories-i.e., perhaps a clean pre-learning scan for the recent learning session related to point 1a is simply not possible.)

      We propose that theoretically, the post-learning recent scan could indeed reflect mixture of remote and recent sequences. This is one of the drawbacks of splitting encoding into two sessions rather than combining encoding into one session and splitting retrieval into an immediate and delayed session; any rest scans that are collected on Day 2 may have signal that relates to processing of the Day 1 remote sequences, which is why we decided against the pre-learning baseline for Day 2, as you had noted.

      You are correct that we alluded to in our original submission when discussing the LOC-mPFC coupling result, and we have taken steps to discuss this more explicitly. In Brief, we find greater LOC-mPFC connectivity only after recent learning relative to the pre-learning baseline, and cortical-cortical connectivity could be indicative of processing memories that already have undergone some consolidation (Takashima et al., 2009; Smith et al., 2010). From another vantage point, the mPFC representation of Day 1 learning may have led to increased connectivity with LOC on Day 2 due to Day 1 learning beginning to resemble consolidated prior knowledge (van Kesteren et al., 2010). While this effect is consistent with prior literature and theory, it's unclear why we would find evidence of processing of the remote memories and not the recent memories. Furthermore, the change in LOC-mPFC connectivity in this scan did not correlate with memory behaviors from either learning session, which could be because signal from this scan reflects a mix of processing of the two different learning sessions. With these ideas in mind, we have fleshed out the discussion of the post-encoding ‘recent’ scan in the Discussion (page 38-39, lines 1039-1044).

      c) Third, I am thinking about how both of the above issues might relate to the authors' findings, and would love to see more added to the paper to address this point. Specifically, I assume there are fluctuations in baseline connectivity profile across days within a person, such that the pre-learning connectivity on day 1 might be different from on day 2. Given that, and the lack of a pre-learning connectivity measure on day 2, it would logically follow that the measure of connectivity change from pre- to post-learning is going to be cleaner for the remote memories. In other words, could the lack of connectivity change observed for the recent scan simply be due to the lack of a within-day baseline? Given that otherwise, the post-learning rest should be the same in that it is an immediate reflection of how connectivity changes as a function of learning (depending on whether the authors think that the "recent" scan is actually reflecting "recent + remote"), it seems odd that they both don't show the same corresponding increase in connectivity-which makes me think it may be a baseline difference. I am not sure if this is what the authors are implying when they talk about how day 1 is most similar to prior investigation on p. 20, but if so it might be helpful to state that directly.

      We agree that it is puzzling that we don’t see that hippocampal-LOC connectivity does not also increase after recent learning, equivalently to what we see after remote learning. However, the fact that there is an increase from baseline rest to post-recent rest in mPFC – LOC connectivity suggests that it’s not an issue with baseline, but rather that the post-recent learning scan is reflecting processing of the remote memories (although as a caveat, there is no relationship with priming).

      On what is now page 23, we were referring to the notion that the Day 1 procedure (baseline rest, learning, post-learning rest) is the most straightforward replication of past work that finds a relationship between hippocampal-cortical coupling and later memory. In contrast, the Day 2 learning and rest scan are less ‘clean’ of a replication in that they are taking place in the shadow of Day 1 learning. We have clarified this in the Results (page 23, lines 597-598).

      d) Fourth and very related to my point 1c, I wonder if the lack of correlations for the recent scan with behavior is interpretable, or if it might just be that this is a noisy measure due to imperfect baseline correction. Do the authors have any data or logic they might be able to provide that could speak to these points? One thing that comes to mind is seeing whether the raw post-learning connectivity values (separately for both recent and remote) show the same pattern as the different scores. However, the authors may come up with other clever ways to address this point. If not, it might be worth acknowledging this interpretive challenge in the Discussion.

      We thought of three different approaches that could help us to understand whether the lack of correlations in between coupling and behavior in the recent scan was due to noise. First, we correlated recognition priming with raw hippocampal-LOC coupling separately for pre- and post-learning scans, as in Author response image 1:

      Author response image 1.

      Note that the post-learning chart depicts the relationship between post-remote coupling and remote priming and between post-recent coupling and recent priming (middle). Essentially, post-recent learning coupling did not relate to priming of recently learned sequences (middle; green) while there remains a trend for a relationship between post-remote coupling and priming for remotely learned sequences (middle; blue). However, the significant relationship between coupling and priming that we reported in the paper (right, blue) is driven both by the initial negative relationship that is observed in the pre-learning scan and the positive relationship in the post-remote learning scan. This highlights the importance of using a change score, as there may be spurious initial relationships between connectivity profiles and to-be-learned information that would then mask any learning- and consolidation-related changes.

      We also reasoned that if comparisons between the post-recent learning scan and the baseline scan are noisier than between the post-remote learning and baseline scan, there may be differences in the variance of the change scores across participants, such that changes in coupling from baseline to post-recent rest may be more variable than coupling from baseline to post-remote rest. We conducted F-tests to compare the variance of the change in these two hippocampal-LO correlations and found no reliable difference (ratio of difference: F(22, 22) = 0.811, p = .63).

      Finally, we explored whether hippocampal-LOC coupling is more stable across participants if compared across two rest scans within the same imaging session (baseline and post-remote) versus across two scans across two separate sessions (baseline and post-recent). Interestingly, coupling was not reliably correlated across scans in either case (baseline/post-remote: r = 0.03, p = 0.89 Baseline/post-recent: r = 0.07, p = .74).

      Finally, we evaluated whether hippocampal-LOC coupling was correlated across different rest scans (see Author response image 2). We reasoned that if such coupling was more correlated across baseline and post-remote scans relative to baseline and post-recent scans, that would indicate a within-session stability of participants’ connectivity profiles. At the same time, less correlation of coupling across baseline and post-recent scans would be an indication of a noisier change measure as the measure would additionally include a change in individuals’ connectivity profile over time. We found that there was no difference in the correlation of hipp-LO coupling is across sessions, and the correlation was not reliably significant for either session (baseline/post-remote: r = 0.03, p = 0.89; baseline/post-recent: r = 0.07, p = .74; difference: Steiger’s t = 0.12, p = 0.9).

      Author response image 2.

      We have included the raw correlations with priming (page 25, lines 654-661, Supplemental Figure 6) as well as text describing the comparison of variances (page 25, lines 642-653). We did not add the comparison of hippocampal-LOC coupling across scans to the current manuscript, as an evaluation of stability of such coupling in the context of learning and reactivation seems out of scope of the current focus of the experiment, but we find this result to be worthy of follow-up in future work.

      In summary, further analysis of our data did not reveal any indication that a comparison of rest connectivity across scan sessions inserted noise into the change score between baseline and post-recent learning scans. However, these analyses cannot fully rule that possibility out, and the current analyses do not provide concrete evidence that the post-recent learning scan comprises signals that are a mixture of processing of recent and remote sequences. We discuss these drawbacks in the Discussion (page 39, lines 1047-1070).

      2) My second major concern is how the authors have operationalized integration and differentiation. The pattern similarity analysis uses an overall correspondence between the neural similarity and a predicted model as the main metric. In the predicted model, C items that are indirectly associated are more similar to one another than they are C items that are entirely unrelated. The authors are then looking at a change in correspondence (correlation) between the neural data and that prediction model from pre- to post-learning. However, a change in the degree of correspondence with the predicted matrix could be driven by either the unrelated items becoming less similar or the related ones becoming more similar (or both!). Since the interpretation in the paper focuses on change to indirectly related C items, it would be important to report those values directly. For instance, as evidence of differentiation, it would be important to show that there is a greater decrease in similarity for indirectly associated C items than it is for unrelated C items (or even a smaller increase) from pre to post, or that C items that are indirectly related are less similar than are unrelated C items post but not pre-learning. Performing this analysis would confirm that the pattern of results matches the authors' interpretation. This would also impact the interpretation of the subsequent analyses that involve the neural integration measures (e.g., correlation analyses like those on p. 16, which may or may not be driven by increased similarity among overlapping C pairs). I should add that given the specificity to the remote learning in mPFC versus recent in LOC and anterior hippocampus, it is clearly the case that something interesting is going on. However, I think we need more data to understand fully what that "something" is.

      We recognize the importance of understanding whether model fits (and changes to them) are driven by similarity of overlapping pairs or non-overlapping pairs. We have modified all figures that visualize model fits to the neural integration model to separately show fits for pre- and post-learning (Figure 3 for mPFC, Supp. Figure 5 for LOC, Supp. Figure 9 for AB similarity in anterior hippocampus & LOC). We have additionally added supplemental figures to show the complete breakdown of similarity each region in a 2 (pre/post) x 2 (overlapping/non-overlapping sequence) x 2 (recent/remote) chart. We decided against including only these latter charts rather than the model fits since the model fits strike a good balance between information and readability. We have also modified text in various sections to focus on these new results.

      In brief, the decrease in model fit for mPFC for the remote sequences was driven primarily by a decrease in similarity for the overlapping C items and not the non-overlapping ones (Supplementary Figure 3, page 18, lines 468-472).

      Interestingly, in LOC, all C items grew more similar after learning, regardless of their overlap or learning session, but the increase in model fit for C items in the recent condition was driven by a larger increase in similarity for overlapping pairs relative to non-overlapping ones (Supp. Figure 5, page 21, lines 533-536).

      We also visualized AB similarity in the anterior hippocampus and LOC in a similar fashion (Supplementary Figure 9).

      We have also edited the Methods sections with updated details of these analyses (page 52, lines 1392-1397). We think that including these results considerably strengthen our claims and we are pleased to have them included.

      3) The priming task occurred before the post-learning exposure phase and could have impacted the representations. More consideration of this in the paper would be useful. Most critically, since the priming task involves seeing the related C items back-to-back, it would be important to consider whether this experience could have conceivably impacted the neural integration indices. I believe it never would have been the case that unrelated C items were presented sequentially during the priming task, i.e., that related C items always appeared together in this task. I think again the specificity of the remote condition is key and perhaps the authors can leverage this to support their interpretation. Can the authors consider this possibility in the Discussion?

      It's true that only C items from the same sequence were presented back-to-back during the priming task, and that this presentation may interfere with observations from the post-learning exposure scan that followed it. We agree that it is worth considering this caveat and have added language in the Discussion (page 40, lines 1071-1086). When designing the study, we reasoned that it was more important for the behavioral priming task to come before the exposure scans, as all items were shown only once in that task, whereas they were shown 4-5 times in a random order in the post-learning exposure phase. Because of this difference in presentation times, and because behavioral priming findings tend to be very sensitive, we concluded that it was more important to protect the priming task from the exposure scan instead of the reverse.

      We reasoned, however, that the additional presentation of the C items in the recognition priming task would not substantially override the sequence learning, as C items were each presented 16 times in their sequence (ABC1 and ABC2 16 times each). Furthermore, as this reviewer suggests, the order of C items during recognition was the same for recent and remote conditions, so the fact that we find a selective change in neural representation for the remote condition and don’t also see that change for the recent condition is additional assurance that the recognition priming order did not substantially impact the representations.

      4) For the priming task, based on the Figure 2A caption it seems as though every sequence contributes to both the control and primed conditions, but (I believe) this means that the control transition always happens first (and they are always back-to-back). Is this a concern? If RTs are changing over time (getting faster), it would be helpful to know whether the priming effects hold after controlling for trial numbers. I do not think this is a big issue because if it were, you would not expect to see the specificity of the remotely learned information. However, it would be helpful to know given the order of these conditions has to be fixed in their design.

      This is a correct understanding of the trial orders in the recognition priming task. We chose to involve the baseline items in the control condition to boost power – this way, priming of each sequence could be tested, while only presenting each item once in this task, as repetition in the recognition phase would have further facilitated response times and potentially masked any priming effects. We agree that accounting for trial order would be useful here, so we ran a mixed-effects linear model to examine responses times both as a function of trial number and of priming condition (primed/control). While there is indeed a large effect of trial number such that participants got faster over time, the priming effect originally observed in the remote condition still holds at the same time. We now report this analysis in the Results section (page 14, lines 337-349 for Expt 1 and pages 14-15, lines 360-362 for Expt 2).

      5) The authors should be cautious about the general conclusion that memories with overlapping temporal regularities become neurally integrated - given their findings in MPFC are more consistent with overall differentiation (though as noted above, I think we need more data on this to know for sure what is going on).

      We realize this conclusion was overly simplistic and, in several places, have revised the general conclusions to be more specific about the nuanced similarity findings.

      6) It would be worth stating a few more details and perhaps providing additional logic or justification in the main text about the pre- and post-exposure phases were set up and why. How many times each object was presented pre and post, and how the sequencing was determined (were any constraints put in place e.g., such that C1 and C2 did not appear close in time?). What was the cover task (I think this is important to the interpretation & so belongs in the main paper)? Were there considerations involving the fact that this is a different sequence of the same objects the participants would later be learning - e.g., interference, etc.?

      These details can be found in the Methods section (pages 50-51, lines 1337-1353) and we’ve added a new summary of that section in the Results (page 17, lines 424- 425 and 432-435). In brief, a visual hash tag appeared on a small subset of images and participants pressed a button when this occurred, and C1 and C2 objects were presented in separate scans (as were A and B objects) to minimize inflated neural similarity due to temporal proximity.

      Reviewer #2 (Public Review):

      The manuscript by Tompary & Davachi presents results from two experiments, one behavior only and one fMRI plus behavior. They examine the important question of how to separate object memories (C1 and C2) that are never experienced together in time and become linked by shared predictive cues in a sequence (A followed by B followed by one of the C items). The authors developed an implicit priming task that provides a novel behavioral metric for such integration. They find significant C1-C2 priming for sequences that were learned 24h prior to the test, but not for recently learned sequences, suggesting that associative links between the two originally separate memories emerge over an extended period of consolidation. The fMRI study relates this behavioral integration effect to two neural metrics: pattern similarity changes in the medial prefrontal cortex (mPFC) as a measure of neural integration, and changes in hippocampal-LOC connectivity as a measure of post-learning consolidation. While fMRI patterns in mPFC overall show differentiation rather than integration (i.e., C1-C2 representational distances become larger), the authors find a robust correlation such that increasing pattern similarity in mPFC relates to stronger integration in the priming test, and this relationship is again specific to remote memories. Moreover, connectivity between the posterior hippocampus and LOC during post-learning rest is positively related to the behavioral integration effect as well as the mPFC neural similarity index, again specifically for remote memories. Overall, this is a coherent set of findings with interesting theoretical implications for consolidation theories, which will be of broad interest to the memory, learning, and predictive coding communities.

      Strengths:

      1) The implicit associative priming task designed for this study provides a promising new tool for assessing the formation of mnemonic links that influence behavior without explicit retrieval demands. The authors find an interesting dissociation between this implicit measure of memory integration and more commonly used explicit inference measures: a priming effect on the implicit task only evolved after a 24h consolidation period, while the ability to explicitly link the two critical object memories is present immediately after learning. While speculative at this point, these two measures thus appear to tap into neocortical and hippocampal learning processes, respectively, and this potential dissociation will be of interest to future studies investigating time-dependent integration processes in memory.

      2) The experimental task is well designed for isolating pre- vs post-learning changes in neural similarity and connectivity, including important controls of baseline neural similarity and connectivity.

      3) The main claim of a consolidation-dependent effect is supported by a coherent set of findings that relate behavioral integration to neural changes. The specificity of the effects on remote memories makes the results particularly interesting and compelling.

      4) The authors are transparent about unexpected results, for example, the finding that overall similarity in mPFC is consistent with a differentiation rather than an integration model.

      Thank you for the positive comments!

      Weaknesses:

      1) The sequence learning and recognition priming tasks are cleverly designed to isolate the effects of interest while controlling for potential order effects. However, due to the complex nature of the task, it is difficult for the reader to infer all the transition probabilities between item types and how they may influence the behavioral priming results. For example, baseline items (BL) are interspersed between repeated sequences during learning, and thus presumably can only occur before an A item or after a C item. This seems to create non-random predictive relationships such that C is often followed by BL, and BL by A items. If this relationship is reversed during the recognition priming task, where the sequence is always BL-C1-C2, this violation of expectations might slow down reaction times and deflate the baseline measure. It would be helpful if the manuscript explicitly reported transition probabilities for each relevant item type in the priming task relative to the sequence learning task and discussed how a match vs mismatch may influence the observed priming effects.

      We have added a table of transition probabilities across the learning, recognition priming, and exposure scans (now Table 1, page 48). We have also included some additional description of the change in transition probabilities across different tasks in the Methods section. Specifically, if participants are indeed learning item types and rules about their order, then both the control and the primed conditions would violate that order. Since C1 and C2 items never appeared together, viewing C1 would give rise to an expectation of seeing a BL item, which would also be violated. This suggests that our priming effects are driven by sequence-specific relationships rather than learning of the probabilities of different item types. We’ve added this consideration to the Methods section (page 45, lines 1212-1221).

      Another critical point to consider (and that the transition probabilities do not reflect) is that during learning, while C is followed either by A or BL, they are followed by different A or BL items. In contrast, a given A is always followed by the same B object, which is always followed by one of two C objects. While the order of item types is semi-predictable, the order of objects (specific items) themselves are not. This can be seen in the response times during learning, such that response times for A and BL items are always slower than for B and C items. We have explained this nuance in the figure text for Table 1.

      2) The choice of what regions of interest to include in the different sets of analyses could be better motivated. For example, even though briefly discussed in the intro, it remains unclear why the posterior but not the anterior hippocampus is of interest for the connectivity analyses, and why the main target is LOC, not mPFC, given past results including from this group (Tompary & Davachi, 2017). Moreover, for readers not familiar with this literature, it would help if references were provided to suggest that a predictable > unpredictable contrast is well suited for functionally defining mPFC, as done in the present study.

      We have clarified our reasoning for each of these choices throughout the manuscript and believe that our logic is now much more transparent. For an expanded reasoning of why we were motivated to look at posterior and not anterior hippocampus, see pages 6-7, lines 135-159, and our response to R2. In brief, past research focusing on post-encoding connectivity with the hippocampus suggests that posterior aspect is more likely to couple with category-selective cortex after learning neutral, non-rewarded objects much like the stimuli used in the present study.

      We also clarify our reasoning for LOC over mPFC. While theoretically, mPFC is thought to be a candidate region for coupling with the hippocampus during consolidation, the bulk of empirical work to date has revealed post-encoding connectivity between the hippocampus and category-selective cortex in the ventral and occipital lobes (page 6, lines 123-134).

      As for the use of the predictable > unpredictable contrast for functionally defining cortical regions, we reasoned that cortical regions that were sensitive to the temporal regularities generated by the sequences may be further involved in their offline consolidation and long-term storage (Danker & Anderson, 2010; Davachi & Danker, 2013; McClelland et al., 1995). We have added this justification to the Methods section (page 18, lines 454-460).

      3) Relatedly, multiple comparison corrections should be applied in the fMRI integration and connectivity analyses whenever the same contrast is performed on multiple regions in an exploratory manner.

      We now correct for multiple comparisons using Bonferroni correction, and this correction depends on the number of regions in which each analysis is conducted. Please see page 55, lines 1483-1490, in the Methods section for details of each analysis.

      Reviewer #3 (Public Review):

      The authors of this manuscript sought to illuminate a link between a behavioral measure of integration and neural markers of cortical integration associated with systems consolidation (post-encoding connectivity, change in representational neural overlap). To that aim, participants incidentally encoded sequences of objects in the fMRI scanner. Unbeknownst to participants, the first two objects of the presented ABC triplet sequences overlapped for a given pair of sequences. This allowed the authors to probe the integration of unique C objects that were never directly presented in the same sequence, but which shared the same preceding A and B objects. They encoded one set of objects on Day 1 (remote condition), another set of objects 24 hours later (recent condition) and tested implicit and explicit memory for the learned sequences on Day 2. They additionally collected baseline and post-encoding resting-state scans. As their measure of behavioral integration, the authors examined reaction time during an Old/New judgement task for C objects depending on if they were preceded by a C object from an overlapping sequence (primed condition) versus a baseline object. They found faster reaction times for the primed objects compared to the control condition for remote but not recently learned objects, suggesting that the C objects from overlapping sequences became integrated over time. They then examined pattern similarity in a priori ROIs as a measure of neural integration and found that participants showing evidence of integration of C objects from overlapping sequences in the medial prefrontal cortex for remotely learned objects also showed a stronger implicit priming effect between those C objects over time. When they examined the change in connectivity between their ROIs after encoding, they also found that connectivity between the posterior hippocampus and lateral occipital cortex correlated with larger priming effects for remotely learned objects, and that lateral occipital connectivity with the medial prefrontal cortex was related to neural integration of remote objects from overlapping sequences.

      The authors aim to provide evidence of a relationship between behavioral and neural measures of integration with consolidation is interesting, important, and difficult to achieve given the longitudinal nature of studies required to answer this question. Strengths of this study include a creative behavioral task, and solid modelling approaches for fMRI data with careful control for several known confounds such as bold activation on pattern analysis results, motion, and physiological noise. The authors replicate their behavioral observations across two separate experiments, one of which included a large sample size, and found similar results that speak to the reliability of the observed behavioral phenomenon. In addition, they document several correlations between neural measures and task performance, lending functional significance to their neural findings.

      Thank you for this positive assessment of our study!

      However, this study is not without notable weaknesses that limit the strength of the manuscript. The authors report a behavioral priming effect suggestive of integration of remote but not recent memories, leading to the interpretation that the priming effect emerges with consolidation. However, they did not observe a reliable interaction between the priming condition and learning session (recent/remote) on reaction times, meaning that the priming effect for remote memories was not reliably greater than that observed for recent. In addition, the emergence of a priming effect for remote memories does not appear to be due to faster reaction times for primed targets over time (the condition of interest), but rather, slower reaction times for control items in the remote condition compared to recent. These issues limit the strength of the claim that the priming effect observed is due to C items of interest being integrated in a consolidation-dependent manner.

      We acknowledge that the lack of a day by condition interaction in the behavioral priming effect should discussed and now discuss this data in a more nuanced manner. While it’s true that the priming effect emerges due to a slowing of the control items over time, this slowing is consistent with classic time-dependent effects demonstrating slower response times for more delayed memories. The fact that the response times in the primed condition does not show this slowing can be interpreted as a protection against this slowing that would otherwise occur. Please see page 29, lines 758-766, for this added discussion.

      Similarly, the interactions between neural variables of interest and learning session needed to strongly show a significant consolidation-related effect in the brain were sometimes tenuous. There was no reliable difference in neural representational pattern analysis fit to a model of neural integration between the short and long delays in the medial prefrontal cortex or lateral occipital cortex, nor was the posterior hippocampus-lateral occipital cortex post-encoding connectivity correlation with subsequent priming significantly different for recent and remote memories. While the relationship between integration model fit in the medial prefrontal cortex and subsequent priming (which was significantly different from that occurring for recent memories) was one of the stronger findings of the paper in favor of a consolidation-related effect on behavior, is it possible that lack of a behavioral priming effect for recent memories due to possible issues with the control condition could mask a correlation between neural and behavioral integration in the recent memory condition?

      While we acknowledge that lack of a statistically reliable interaction between neural measures and behavioral priming in many cases, we are heartened by the reliable difference in the relationship between mPFC similarity and priming over time, which was our main planned prediction. In addition to adding caveats in the discussion about the neural measures and behavioral findings in the recent condition (see our response to R1.1 and R1.4 for more details), we have added language throughout the manuscript noting the need to interpret these data with caution.

      These limitations are especially notable when one considers that priming does not classically require a period of prolonged consolidation to occur, and prominent models of systems consolidation rather pertain to explicit memory. While the authors have provided evidence that neural integration in the medial prefrontal cortex, as well as post-encoding coupling between the lateral occipital cortex and posterior hippocampus, are related to faster reaction times for primed objects of overlapping sequences compared to their control condition, more work is needed to verify that the observed findings indeed reflect consolidation dependent integration as proposed.

      We agree that more work is needed to provide converging evidence for these novel findings. However, we wish to counter the notion that systems consolidation models are relevant only for explicit memories. Although models of systems consolidation often mention transformations from episodic to semantic memory, the critical mechanisms that define the models involve changes in the neural ensembles of a memory that is initially laid down in the hippocampus and is taught to cortex over time. This transformation of neural traces is not specific to explicit/declarative forms of memory. For example, implicit statistical learning initially depends on intact hippocampal function (Schapiro et al., 2014) and improves over consolidation (Durrant et al., 2011, 2013; Kóbor et al., 2017).

      Second, while there are many classical findings of priming during or immediately after learning, there are several instances of priming used to measure consolidation-related changes to newly learned information. For instance, priming has been used as a measure of lexical integration, demonstrating that new word learning benefits from a night of sleep (Wang et al., 2017; Gaskell et al., 2019) or a 1-week delay (Tamminen & Gaskell, 2013). The issue is not whether priming can occur immediately, it is whether priming increases with a delay.

      Finally, it is helpful to think about models of memory systems that divide memory representations not by their explicit/implicit nature, but along other important dimensions such as their neural bases, their flexibility vs rigidity, and their capacity for rapid vs slow learning (Henke, 2010). Considering this evidence, we suggest that systems consolidation models are most useful when considering how transformations in the underlying neural memory representation affects its behavioral expression, rather than focusing on the extent that the memory representation is explicit or implicit.

      With all this said, we have added text to the discussion reminding the reader that there was no statistically significant difference in priming as a function of the delay (page 29, lines 764 - 766). However, we are encouraged by the fact that the relationship between priming and mPFC neural similarity was significantly stronger for remotely learned objects relative to recently learned ones, as this is directly in line with systems consolidation theories.

      References

      Abolghasem, Z., Teng, T. H.-T., Nexha, E., Zhu, C., Jean, C. S., Castrillon, M., Che, E., Di Nallo, E. V., & Schlichting, M. L. (2023). Learning strategy differentially impacts memory connections in children and adults. Developmental Science, 26(4), e13371. https://doi.org/10.1111/desc.13371

      Dobbins, I. G., Schnyer, D. M., Verfaellie, M., & Schacter, D. L. (2004). Cortical activity reductions during repetition priming can result from rapid response learning. Nature, 428(6980), 316–319. https://doi.org/10.1038/nature02400

      Durrant, S. J., Cairney, S. A., & Lewis, P. A. (2013). Overnight consolidation aids the transfer of statistical knowledge from the medial temporal lobe to the striatum. Cerebral Cortex, 23(10), 2467–2478. https://doi.org/10.1093/cercor/bhs244

      Durrant, S. J., Taylor, C., Cairney, S., & Lewis, P. A. (2011). Sleep-dependent consolidation of statistical learning. Neuropsychologia, 49(5), 1322–1331. https://doi.org/10.1016/j.neuropsychologia.2011.02.015

      Gaskell, M. G., Cairney, S. A., & Rodd, J. M. (2019). Contextual priming of word meanings is stabilized over sleep. Cognition, 182, 109–126. https://doi.org/10.1016/j.cognition.2018.09.007

      Henke, K. (2010). A model for memory systems based on processing modes rather than consciousness. Nature Reviews Neuroscience, 11(7), 523–532. https://doi.org/10.1038/nrn2850

      Kóbor, A., Janacsek, K., Takács, Á., & Nemeth, D. (2017). Statistical learning leads to persistent memory: Evidence for one-year consolidation. Scientific Reports, 7(1), 760. https://doi.org/10.1038/s41598-017-00807-3

      Kuhl, B. A., & Chun, M. M. (2014). Successful remembering elicits event-specific activity patterns in lateral parietal cortex. The Journal of Neuroscience, 34(23), 8051–8060. https://doi.org/10.1523/JNEUROSCI.4328-13.2014

      Richter, F. R., Chanales, A. J. H., & Kuhl, B. A. (2016). Predicting the integration of overlapping memories by decoding mnemonic processing states during learning. NeuroImage, 124, Part A, 323–335. https://doi.org/10.1016/j.neuroimage.2015.08.051

      Schapiro, A. C., Gregory, E., Landau, B., McCloskey, M., & Turk-Browne, N. B. (2014). The necessity of the medial-temporal lobe for statistical learning. Journal of Cognitive Neuroscience, 1–12. https://doi.org/10.1162/jocn_a_00578

      Schlichting, M. L., & Preston, A. R. (2014). Memory reactivation during rest supports upcoming learning of related content. Proceedings of the National Academy of Sciences, 111(44), 15845–15850. https://doi.org/10.1073/pnas.1404396111

      Smith, J. F., Alexander, G. E., Chen, K., Husain, F. T., Kim, J., Pajor, N., & Horwitz, B. (2010). Imaging systems level consolidation of novel associate memories: A longitudinal neuroimaging study. NeuroImage, 50(2), 826–836. https://doi.org/10.1016/j.neuroimage.2009.11.053

      Takashima, A., Nieuwenhuis, I. L. C., Jensen, O., Talamini, L. M., Rijpkema, M., & Fernández, G. (2009). Shift from hippocampal to neocortical centered retrieval network with consolidation. The Journal of Neuroscience, 29(32), 10087–10093. https://doi.org/10.1523/JNEUROSCI.0799-09.2009

      Tamminen, J., & Gaskell, M. G. (2013). Novel word integration in the mental lexicon: Evidence from unmasked and masked semantic priming. The Quarterly Journal of Experimental Psychology, 66(5), 1001–1025. https://doi.org/10.1080/17470218.2012.724694

      van Kesteren, M. T. R. van, Fernández, G., Norris, D. G., & Hermans, E. J. (2010). Persistent schema-dependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proceedings of the National Academy of Sciences, 107(16), 7550–7555. https://doi.org/10.1073/pnas.0914892107

      Wang, H.-C., Savage, G., Gaskell, M. G., Paulin, T., Robidoux, S., & Castles, A. (2017). Bedding down new words: Sleep promotes the emergence of lexical competition in visual word recognition. Psychonomic Bulletin & Review, 24(4), 1186–1193. https://doi.org/10.3758/s13423-016-1182-7

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this important paper, the authors propose a computational model for understanding how the dynamics of neural representations may lead to specific patterns of errors as observed in working memory tasks. The paper provides solid evidence showing how a two-area model of sensory-memory interactions can account for the error patterns reported in orientation estimation tasks with delays. By integrating ideas from efficient coding and attractor networks, the resulting theoretical framework is appealing, and nicely captures some basic patterns of behavior data and the distributed nature of memory representation as reported in prior neurophysiological studies. The paper can be strengthened if (i) further analyses are conducted to deepen our understanding of the circuit mechanisms underlying the behavior effects; (ii) the necessity of the two-area network model is better justified; (iii) the nuanced aspects of the behavior that are not captured by the current model are discussed in more detail.

      We thank the Editors and Reviewers for their constructive comments. In response to the suggestions provided, we have implemented the following revisions:

      - Clarified the origin of the specific pattern of diffusion: We showed that variance patterns remain consistent across different noise types or levels in new Figure 5 – Figure supplement 2 and Figure 9 – Figure supplement 1 (uniform Gaussian noise with varying strengths). This is connected to the representation geometry induced by heterogeneous connections (Eq. 21).

      - Provided an intuitive explanation of the two-module network’s advantages: Additional simulations demonstrated that heterogeneity degree of sensory connections and intermodal connection strengths affect drift and diffusion terms differently (new Figure 6). This endows an extra degree of freedom in controlling heterogeneity in drift and diffusion terms in the two-module network (new Figure 9).

      - Addressed a limitation and future directions in the Discussion: Our study is limited to the dynamic evolution of memory representation for a single orientation stimulus and its associated error patterns. We acknowledge the need for further investigation to capture nuanced error patterns in broader experimental settings, such as changes in error patterns for varying stimulus presentation durations in perception tasks. We have discussed potential extensions, such as incorporating more biologically plausible baseline activities, external noise, or variations of loss functions.

      Additionally, we showed consistent error patterns when decoded from activities of the sensory module (Figure 4 – Figure supplement 1), and incorrect error patterns with autapses in the sensory module (Figure 7 – Figure supplement 2). Below, we have reorganized each Reviewer’s comments and separately addressed them. All changes were shown in red in the manuscript submitted as Related Manuscript File.  

      Reviewer #1:

      Summary:

      Working memory is imperfect - memories accrue errors over time and are biased towards certain identities. For example, previous work has shown memory for orientation is more accurate near the cardinal directions (i.e., variance in responses is smaller for horizontal and vertical stimuli) while being biased towards diagonal orientations (i.e., there is a repulsive bias away from horizontal and vertical stimuli). The magnitude of errors and biases increase the longer an item is held in working memory and when more items are held in working memory (i.e., working memory load is higher). Previous work has argued that biases and errors could be explained by increased perceptual acuity at cardinal directions. However, these models are constrained to sensory perception and do not explain how biases and errors increase over time in memory. The current manuscript builds on this work to show how a two-layer neural network could integrate errors and biases over a memory delay. In brief, the model includes a 'sensory' layer with heterogenous connections that lead to the repulsive bias and decreased error in the cardinal directions. This layer is then reciprocally connected with a classic ring attractor layer. Through their reciprocal interactions, the biases in the sensory layer are constantly integrated into the representation in memory. In this way, the model captures the distribution of biases and errors for different orientations that have been seen in behavior and their increasing magnitude with time. The authors compare the two-layer network to a simpler one-network model, showing that the one-model network is harder to tune and shows an attractive bias for memories that have lower error (which is incompatible with empirical results).

      Strengths:

      The manuscript provides a nice review of the dynamics of items in working memory, showing how errors and biases differ across stimulus space. The two-layer neural network model is able to capture the behavioral effects as well as relate to neurophysiological observations that memory representations are distributed across the sensory cortex and prefrontal cortex.

      The authors use multiple approaches to understand how the network produces the observed results. For example, analyzing the dynamics of memories in the low-dimensional representational space of the networks provides the reader with an intuition for the observed effects.

      As a point of comparison with the two-layer network, the authors construct a heterogenous one-layer network (analogous to a single memory network with embedded biases). They argue that such a network is incapable of capturing the observed behavioral effects but could potentially explain biases and noise levels in other sensory domains where attractive biases have lower errors (e.g., color).

      The authors show how changes in the strength of Hebbian learning of excitatory and inhibitory synapses can change network behavior. This argues for relatively stronger learning in inhibitory synapses, an interesting prediction.

      The manuscript is well-written. In particular, the figures are well done and nicely schematize the model and the results.

      Overall:

      Overall, the manuscript was successful in building a model that captured the biases and noise observed in working memory. This work complements previous studies that have viewed these effects through the lens of optimal coding, extending these models to explain the effects of time in memory. In addition, the two-layer network architecture extends previous work with similar architectures, adding further support to the distributed nature of working memory representations.

      We appreciate the reviewer’s comments that the work successfully explains error patterns of working memory, extends previous models of optimal coding to include temporal effects, and supports the distributed nature of working memory representations. Below, we address the specific concerns of the reviewer.

      Weaknesses:

      Despite its strengths, the manuscript does have some weaknesses.

      Major Point 1: First, as far as we can tell, behavioral data is only presented in schematic form. This means some of the nuances of the effects are lost. It also means that the model is not directly capturing behavioral effects. Therefore, while providing insight into the general phenomenon, the current manuscript may be missing some important aspects of the data.

      Relatedly, the models are not directly fit to behavioral data. This makes it hard for the authors to exclude the possibility that there is a single network model that could capture the behavioral effects. In other words, it is hard to support the authors' conclusion that "....these evolving errors...require network interaction between two distinct modules." (from the abstract, but similar comments are made throughout the manuscript). Such a strong claim needs stronger evidence than what is presented. Fitting to behavioral data could allow the authors to explore the full parameter space for both the one-layer and two-layer network architectures.

      In addition, directly comparing the ability of different model architectures to fit behavioral data would allow for quantitative comparison between models. Such quantitative comparisons are currently missing from the manuscript.

      We agree with the reviewer that incorporating quantitative comparisons to the data will strengthen our results. However, we note the limitations in fitting network models to behavior data. Previous studies employed drift-diffusion models to fit error patterns observed in visual working memory tasks (Panichello, DePasquale et al. 2019, Gu, Lee et al. 2023). In contrast to these phenomenological models, network models have more parameters that can cause overfitting. Consequently, we focused on comparing the qualitative differences between onemodule and two-module networks, examining whether each network can generate the correct shape of bias and variance patterns. In response to the reviewers’ suggestions, we have revised the manuscript to reinforce our claim by providing an intuitive explanation of the qualitative differences between these two models (see response to your Major Point 3) and conducting additional simulations to support our claim that error patterns are consistent under different noise types or levels (see responses to Major Points 2 of Reviewer 2, and Minor point 1 of Reviewer 3).  

      Major Point 2: To help broaden the impact of the paper, it would be helpful if the authors provided insight into how the observed behavioral biases and/or network structures influence cognition. For example, previous work has argued that biases may counteract noise, leading to decreased variance at certain locations. Is there a similar normative explanation for why the brain would have repulsive biases away from commonly occurring stimuli? Are they simply a consequence of improved memory accuracy? Why isn't this seen for all stimulus domains?

      Previous work has found both diffusive noise and biases increase with the number of items in working memory. It isn't clear how the current model would capture these effects. The authors do note this limitation in the Discussion, but it remains unclear how the current model can be generalized to a multi-item case.

      As pointed by the reviewer, attractors counteract noise and lead to reduced variance around the attracting locations. However, most attractor models reporting such effects did not consider the interaction of attractor dynamics with the sensory network. For the repulsive biases considered here, previous studies on the sensory stage have theoretically demonstrated that they could lower the discrimination threshold around cardinal orientations (e.g., see Wei and Stocker, 2017). In Wei and Stocker (2017), the authors showed that this relationship between bias and discrimination threshold was observed across many stimulus modalities. In the present study, we demonstrated that the bias and variability patterns naturally emerged from the underlying neural dynamics. Nonetheless, we also noted that color working memory shows attractive biases, which necessitates further study of the underlying neural mechanisms of color perception. A plausible explanation is that the categorical effect dominates color perception and memory processes, as suggested by existing modelling work (Tajima et al., 2016). 

      However, we do note the limitation of our current work that does not capture nuanced error patterns in broader experimental settings, such as variation of perception tasks or memory of multiple items. For instance, while shorter stimulus presentations with no explicit delay lead to larger biases experimentally, our current model, which starts activities from a flat baseline, shows an increase in bias throughout the stimulus presentation. Additionally, the error variance during stimulus presentation is almost negligible compared to that during the delay period, as the external input overwhelms the internal noise. These mismatches during stimulus presentation have minimal impact on activities during the delay period when the internal dynamics dominate. Nonetheless, the model needs further refinement to accurately reproduce activities during stimulus presentation, possibly by incorporating more biologically plausible baseline activities. Also, a recent Bayesian perception model suggested different types of noise like external noise or variations in loss functions that adjust tolerance to small errors may help explain various error patterns observed across different modalities (Hahn and Wei, 2024). Even for memories involving multiple items, noise can be critical in determining error patterns, as encoding more items might be equivalent to higher noise for each individual item (Chunharas, Rademaker et al. 2022).

      To make this limitation clear, we included the above response in a new paragraph on limitations and future directions in the Discussion (2nd paragraph in p. 11). Also, we modified the text that previously described that our model can “explain error patterns in both perception and working memory tasks” in p. 3 and p. 5 as

      “explain error patterns in working memory tasks that are similar to those observed in perception tasks.”

      And we added the bias and variance pattern right after the stimulus offset in Figure 4C,D with the following note in p. 6:

      “Note that the variance of errors is nearly zero during stimulus presentation because the external input overwhelms internal noise, which does not fully account for the variability observed during perception tasks (see Discussion).”

      Major Point 3: The role of the ring attractor memory network isn't completely clear. There is noise added in this stage, but how is this different from the noise added at the sensory stage? Shouldn't these be additive? Is the noise necessary?  

      Similarly, it isn't clear whether the memory network is necessary - can it be replaced by autapses (self-connections) in the sensory network to stabilize its representation? In short, it would be helpful for the authors to provide an intuition for why the addition of the memory network facilitates the repulsive bias.

      Internal noise in the circuits is necessary to replicate the variability of the readout in estimating the stimulus because our model did not incorporate external noise (i.e., noise associated with the stimulus). We note the distinct noise implementation in both extension of the previous Bayesian model (Fig. 2) and the network models (Fig. 3 and beyond). In Fig. 2, we followed previous studies by employing static tuning curves for the sensory module and Poisson noise to account for variability in the perception stage. In the memory stage, sensory output undergoes the addition of constant Gaussian noise, replicating the diffusion process along the memory manifolds as shown in traditional memory network models. In the network models, we do consider the same noise in both sensory and memory modules, subjecting all units to Poisson noise to simulate neuronal spiking variability. In the network models, the two modules dynamically interact, which warp the energy landscape and generate uneven noise coefficients along the memory manifold, reminiscent of the conditions shown in Fig. 1. 

      From the bias and variance patterns, we can infer two requirements the network to fulfill – one is efficient coding suggested by sensory perception stage and the other is memory maintenance. The former is achieved by realizing the previous Bayesian models in the sensory networks with specific heterogeneous connections. In our work, the latter is achieved by strong recurrent connections to sustain persistent activity during the delay period. On the other hand, as the reviewer noted, memory can be maintained through autapses in the sensory network, which is equivalent to elongating intrinsic time constants of individual units (Seung, Lee et al. 2000). We simulated such sensory network and showed the results in Figure 7 – Figure Supplement 2. As shown in the figure, a larger time constant also slows down the increase in bias significantly, which can be deduced from Eq. 20. 

      When memory is maintained through strong recurrent connections, there are two possible scenarios, one-module network combining both efficient coding and memory maintenance (Fig. 8), or two-module network satisfying each condition in different modules (Fig. 7). In both networks, heterogeneous connections achieving efficient coding shape drift and diffusion dynamics similarly as illustrated in Figure 9 (previous Figure 7 – Supplement 1). Discrete attractors are formed near oblique orientations, inducing an increase of repulsive bias during the delay period. Also, noise coefficient is lowest at cardinal orientations. However, there is a difference in the asymmetry degrees of the drift and diffusion at cardinal and oblique orientations the one-module network shows larger asymmetry in potential energy, while the two-module network shows larger asymmetry in the noise coefficient. These varying degrees of heterogeneity in drift and diffusion lead to qualitative differences in bias and variance patterns in estimation. Shallower potential differences with more asymmetrical noise coefficients result in correct bias and variance patterns in the two-module network, while the opposite leads to flipped variance patterns in the one-module network.  

      An intuitive explanation of how connectivity heterogeneity differentially affects the asymmetry degrees of drift and diffusion in one-module and two-module networks is detailed in our response to Major Point 3 of Reviewer 2. In summary, separating the memory module from the sensory module imposes an additional degree of freedom, allowing for more flexible control over drift and diffusion, thereby bias and variance patterns. To clarify this, we have added simulations in Figure 6 and Figure 9 and provided an intuitive explanation in the accompanying texts in pp. 6-7 and p. 9. 

      Minor Point 1: The code is stated to be available on GitHub, but I could not access it.

      Thank you for pointing it out. The repository is now publicly available.

      Minor Point 2: The legend for late/mid/early is in an odd place in Figure 1, as it is in panel E where you can't see the difference between the lines. We would suggest moving this to another panel where the different time points are clear. In general, we would suggest adding more text (legends and titles) to the figure to help the reader understand the figures without having to refer to the details in the text and/or figure legends.

      We have now moved the legend to panel B where late/mid/early is first introduced. Also, we added more text to the figure legend (Figure 3,4,5,8). 

      Minor Point 3: The last line of the first paragraph of the Introduction ends awkwardly. I assume it's referring to indirect evidence for dynamics in memory?

      Thank you. We have modified the sentence as follows:

      “For instance, biases of errors, the systematic deviation from the original stimuli, observed in estimation tasks have been used as indirect evidence to infer changes in internal representations of stimuli.”

      Minor Point 4: Similarly, the first line of the second paragraph of the Introduction was also awkward. Specifically, the clause "..., such as nonuniform stimulus distribution in nature." Seems to be missing a 'the' before 'nonuniform'.

      We have modified the sentence as follows:

      “One important source of biases is adaptation to environmental statistics, such as the nonuniform stimulus distribution found in nature or the limited range in specific settings.”

      Reviewer #2:

      In this manuscript, Yang et al. present a modeling framework to understand the pattern of response biases and variance observed in delayed-response orientation estimation tasks. They combine a series of modeling approaches to show that coupled sensory-memory networks are in a better position than single-area models to support experimentally observed delay-dependent response bias and variance in cardinal compared to oblique orientations. These errors can emerge from a population-code approach that implements efficient coding and Bayesian inference principles and is coupled to a memory module that introduces random maintenance errors. A biological implementation of such operation is found when coupling two neural network modules, a sensory module with connectivity inhomogeneities that reflect environment priors, and a memory module with strong homogeneous connectivity that sustains continuous ring attractor function. Comparison with single-network solutions that combine both connectivity inhomogeneities and memory attractors shows that two-area models can more easily reproduce the patterns of errors observed experimentally. This, the authors take as evidence that a sensory-memory network is necessary, but I am not convinced about the evidence in support of this "necessity" condition. A more in-depth understanding of the mechanisms operating in these models would be necessary to make this point clear.

      Strengths:

      The model provides an integration of two modeling approaches to the computational bases of behavioral biases: one based on Bayesian and efficient coding principles, and one based on attractor dynamics. These two perspectives are not usually integrated consistently in existing studies, which this manuscript beautifully achieves. This is a conceptual advancement, especially because it brings together the perceptual and memory components of common laboratory tasks.

      The proposed two-area model provides a biologically plausible implementation of efficient coding and Bayesian inference principles, which interact seamlessly with a memory buffer to produce a complex pattern of delay-dependent response errors. No previous model had achieved this.

      We appreciate the reviewer’s comments that the work is a conceptual advancement, combining Bayesian perception models and attractor memory models, and produces error patterns which wasn’t achieved by previous models. Below, we address the specific concerns of the reviewer.

      Major Point 1: The correspondence between the various computational models is not fully disclosed. It is not easy to see this correspondence because the network function is illustrated with different representations for different models and the correspondence between components of the various models is not specified. For instance, Figure 1 shows that a specific pattern of noise is required in the low-dimensional attractor model, but in the next model in Figure 2, the memory noise is uniform for all stimuli. How do these two models integrate? What element in the population-code model of Figure 2 plays the role of the inhomogeneous noise of Figure 1? Also, the Bayesian model of Figure 2 is illustrated with population responses for different stimuli and delays, while the attractor models of Figures 3 and 4 are illustrated with neuronal tuning curves but not population activity. In addition, error variance in the Bayesian model appears to be already higher for oblique orientations in the first iteration whereas it is only first shown one second into the delay for the attractor model in Figure 4. It is thus unclear whether variance inhomogeneities appear already at the perceptual stage in the attractor model, as it does in the population-code model. Of course, correspondences do not need to be perfect, but the reader does not know right now how far the correspondence between these models goes.

      Thank you for pointing out the lack of clarity in the correspondence between different models. We note the distinct noise implementation in extension of the previous Bayesian model (Fig. 2) and the network models (Fig. 3 and beyond). In Fig. 2, we followed previous studies by employing static tuning curves for the sensory module and Poisson noise to account for variability in the perception stage. In the memory stage, sensory output undergoes the addition of constant Gaussian noise, replicating the diffusion process along the memory manifolds as shown in traditional memory network models. In the network models in Fig. 3 and beyond, we do consider the same noise in both sensory and memory modules, subjecting all units to Poisson noise to simulate neuronal spiking variability. In the network models, the two modules dynamically interact, which warp the energy landscape and generate uneven noise coefficients along the memory manifold, reminiscent of the conditions shown in Fig. 1. 

      However, we do note the limitation of the current study which cannot fully replicate behavior patterns observed in variation of perception tasks. For instance, while shorter stimulus presentations with no explicit delay lead to larger biases experimentally, our current model, which starts activities from a flat baseline, shows an increase in bias throughout the stimulus presentation. Additionally, the error variance during stimulus presentation is almost negligible compared to that during the delay period, as the external input overwhelms the internal noise. These mismatches during stimulus presentation have minimal impact on activities during the delay period when the internal dynamics dominate. Nonetheless, the model needs further refinement to accurately reproduce activities during stimulus presentation, possibly by incorporating more biologically plausible baseline activities. To make this limitation clear, we included the above response in a new paragraph on limitations and future directions in the Discussion (2nd paragraph in p. 11). Also, we modified the text that previously described that our model can “explain error patterns in both perception and working memory tasks” in p. 3 and p. 5 as “explain error patterns in working memory tasks that are similar to those observed in perception tasks.”

      And we added the bias and variance pattern right after the stimulus offset in Figure 4C,D with the following note in p. 6:

      “Note that the variance of errors is nearly zero during stimulus presentation because the external input overwhelms internal noise, which does not fully account for the variability observed during perception tasks (see Discussion).”

      Major Point 2: The manuscript does not identify the mechanistic origin in the model of Figure 4 of the specific noise pattern that is required for appropriate network function (with higher noise variance at oblique orientations). This mechanism appears critical, so it would be important to know what it is and how it can be regulated. In particular, it would be interesting to know if the specific choice of Poisson noise in Equation (3) is important. Tuning curves in Figure 4 indicate that population activity for oblique stimuli will have higher rates than for cardinal stimuli and thus induce a larger variance of injected noise in oblique orientations, based on this Poissonnoise assumption. If this explanation holds, one wonders if network inhomogeneities could be included (for instance in neural excitability) to induce higher firing rates in the cardinal/oblique orientations so as to change noise inhomogeneities independently of the bias and thus control more closely the specific pattern of errors observed, possibly within a single memory network.

      The specific pattern of noise coefficient, lower variability at cardinal orientations in the network models, inherited that of the previous Bayesian perception models (Wei and Stocker, 2017). Either in one-module or two-module networks, the specific pattern of heterogeneous connections induces more neurons tuned to cardinal orientations with narrower tuning widths. Such sparser representation near cardinal stimuli generates lower noise variability even with constant Gaussian noise. This is verified in Eq. 21 in Methods, showing the derivation of noise coefficients – with constant Gaussian noise, Eq. 21 is modified as 

      because . Thus, 𝒟(𝜃) is inversely proportional to , which reflects the length travelled on the stable trajectory 𝒔𝒔‾(𝜃𝜃) when θ increases by one unit. For sparser representation,   becomes larger and 𝒟(𝜃) is reduced. Intuitively, with more neurons tuned to cardinal stimuli, noise is averaged and reduced. In sum, the heterogeneous connection induces the specific noise coefficient, and the choice of Poisson-like noise is not essential, although it facilitates the correct variance pattern. To clarify this point, we have added the results of using uniform Gaussian noise in new Figure 5 – Figure Supplement 2 and Figure 9 – Figure Supplement 1.

      Major point 3: The main conclusion of the manuscript, that the observed patterns of errors "require network interaction between two distinct modules" is not convincingly shown. The analyses show that there is a quantitative but not a qualitative difference between the dynamics of the single memory area compared to the sensory-memory two-area network, for specific implementations of these models (Figure 7 - Figure Supplement 1). There is no principled reasoning that demonstrates that the required patterns of response errors cannot be obtained from a different memory model on its own. Also, since the necessity of the two-area configuration is highlighted as the main conclusion of the manuscript, it is inconvenient that the figure that carefully compares these conditions is in the Supplementary Material.

      Following the suggestion by the reviewer, we moved Figure 7 – Figure supplement 1 as new Figure 9. As noted by the reviewer, drift dynamics and diffusion projected onto the lowdimensional memory manifold have similar shapes in both one-module and two-module networks, with the lowest potential and highest noise coefficient observed at the oblique orientations. However, there is a difference in the asymmetry degrees of the drift and diffusion at cardinal and oblique orientations: the one-module network shows larger asymmetry in potential energy, while the two-module network shows larger asymmetry in the noise coefficient. These varying degrees of heterogeneity in drift and diffusion lead to qualitative differences in bias and variance patterns in estimation. Shallower potential differences with more asymmetrical noise coefficients result in correct bias and variance patterns in the two-module network, while the opposite leads to flipped variance patterns in the one-module network.  

      To intuitively understand how connectivity heterogeneity differentially affects the asymmetry degrees of drift and diffusion in one-module and two-module networks, consider a simple case where only the excitatory connection is heterogeneous, denoted as α. The asymmetry of diffusion reflects the degree of heterogeneity in either the sensory or memory modules. The noise coefficient derived from the low-dimensional projection is mainly determined by the heterogeneity of . While the one-module network, with a much lower α, shows almost flat , the two-module network shows more prominent asymmetry in with a larger α in the sensory module.  

      On the other hand, the asymmetry in the potential energy is influenced differently by the connectivity heterogeneity of the sensory module and that of the memory module. For memory maintenance, overall recurrent connections need to be strong enough to overcome intrinsic decay, simplifying to w = 1. In the one-module network, α in the memory module creates potential differences at cardinal and oblique orientations as 1± α. On the other hand, in the two-module network, with w = 1 fulfilled by the memory module, α in the sensory module acts as a perturbation. The effect of α is modulated by the connectivity strengths between sensory and memory module, denoted by γ. Potential differences at cardinal and oblique orientations can be represented as 1± γα. While both α and γ determine the energy level, the noise coefficient less depends on γ (see response to your Major Point 4). Thus, even for relatively larger α in the sensory module leading to more asymmetrical noise coefficients, the potential difference could be shallower in the two-module network with small γ<1. 

      In sum, in the two-module network, there is an additional degree of freedom, connectivity strengths between sensory and memory modules, which provides the flexibility to control drift and diffusion separately, unlike in the one-module network. To clarify this, we have added simulations in Figure 6 and Figure 9 and provided an intuitive explanation in the accompanying texts in pp. 6-7 and p. 9.

      Major Point 4: The proposed model has stronger feedback than feedforward connections between the sensory and memory modules. This is not a common assumption when thinking about hierarchical processing in the brain, and it is not discussed in the manuscript.

      As noted in the previous response, the connectivity strengths between the sensory and memory modules, denoted as γ, are important parameters determining the qualitative features of bias and variance patterns. γ corresponds to the product of Jf and Jb, feedforward and feedback strengths, and our additional simulation shows that the bias and variance patterns remain similar for a fixed γ. Note that further simulation revealed that the heterogeneity degree, α, and the intermodal connectivity strengths, γ, influence the drift and diffusion terms differently. As this result highlights the advantage of the two-module network, we moved the dependence of error patterns on intermodal connectivity strengths to the main figure (previous Figure 5 – Figure supplement 2), which now includes more simulations showing bias and variance patterns for different Jf and Jb and for different α and Jb (new Figure 6). 

      Minor Point 1: page 11: "circular standard deviation of sigma_theta = 1.3º at cardinal orientations" but in Figure 2 we see sigma_theta = 2º at cardinal orientations.

      The circular standard deviation of 𝜎𝜎𝜃𝜃 = 1.3º refers to the standard deviation of the sensory module output in iteration 1, that is, before feeding into the memory module to complete this iteration. In figure 2, the standard deviation plotted is that of the output of the memory module, which has a Gaussian memory noise with standard deviation 1.3º added on top of the sensory output. Hence we see a standard deviation of √(1.32 + 1.32) = 1.84º which seems close to 2º in the figure. We added a sentence in this paragraph of Methods (p. 13) to avoid confusion.

      Minor Point 2: equation (19): What does the prime of ||s'(theta)|| mean?

      The prime represents taking the derivative with respect to θ:

      reflects the length travelled on the stable trajectory when θ increases by one unit. As we plotted in Figure 9 and Figure 5 – Figure supplement 2, we clarified it in the legend.

      Minor Point 3: page 15: "The Fisher information (F) is estimated by assuming that the likelihood function p(r|theta) is Gaussian", but the whole point of Wei and Stocker (2015) and your Figure 2 is that likelihoods are skewed in these networks. This could be clarified.

      Thank you for pointing out the lack of clarity. In Wei and Stocker (2015) and our Figure 2, the likelihood is skewed with respect to 𝜃 (note the horizontal axes). However, in the Methods section, we assumed the distribution function 𝑝(𝑟|𝜃) is Gaussian with respect to 𝑟𝑟 when 𝜃 is considered fixed:

      where . The distribution function is skewed with respect to 𝜃 because the tuning curves are skewed with respect to 𝜃 (see Figure 4B). We have clarified our assumption in p. 16 to avoid confusion.

      Reviewer #3:

      Summary:

      The present study proposes a neural circuit model consisting of coupled sensory and memory networks to explain the circuit mechanism of the cardinal effect in orientation perception which is characterized by the bias towards the oblique orientation and the largest variance at the oblique orientation.

      Strengths:

      The authors have done numerical simulations and preliminary analysis of the neural circuit model to show the model successfully reproduces the cardinal effect. And the paper is wellwritten overall. As far as I know, most of the studies on the cardinal effect are at the level of statistical models, and the current study provides one possibility of how neural circuit models reproduce such an effect.

      We appreciate the reviewer’s comments that the work successfully reproduces error patterns through circuit models, advancing beyond previous statistical models. Below, we address the specific concerns of the reviewer.

      Weaknesses:

      There are no major weaknesses and flaws in the present study, although I suggest the author conduct further analysis to deepen our understanding of the circuit mechanism of the cardinal effects. Please find my recommendations for concrete comments.

      Minor Point 1: Likely, the interplay of the potential function (Figure 5D) and the noise amplitude (Figure 5C) in the memory network is the key to reproducing the cardinal effect. For me, it is obvious to understand the spatial profile of the potential function as what it currently looks like (Figure 5D), while I haven't had an intuitive understanding of how the spatial profile of noise structure emerges from the circuit model. Therefore I suggest the authors provide a more comprehensive analysis, including theory and simulation, to demonstrate how the noise structure depends on the network parameters. I am concerned about whether the memory network can still reproduce the minimal variance at the cardinal orientation if we reduce the Fano factor of single neuron variabilities. In this case, the shape of the potential function will be dominant in determining the variance over orientation (Figure 5F) and the result might be reverted.

      Thank you for the suggestion. Either in one-module or two-module networks, the specific pattern of heterogeneous connections induces more neurons tuned to cardinal orientations with narrower tuning widths. Such sparser representation near cardinal stimuli generates lower noise variability even with constant Gaussian noise, which is now added in Figure 5 – Figure Supplement 2. We also showed that the distinctive error patterns in one-module and two-module networks are maintained under Gaussian noise with varying amplitude in Figure 9 – Figure supplement 1.

      Minor Point 2: In addition, it is interesting to show how the representation of the sensory module looks like, e.g., plotting the figures similar to Figures B-F but from the sensory module. I feel the sensory module doesn't have a result similar to Figure 5F. Is it?

      Yes, decoded error patterns obtained from the sensory module are similar to the results obtained from the memory module. We have added Figure 4 – Figure supplement 1 to show that our conclusions remain valid when decoding from the sensory module.

      Minor point 3: Last but not least, I have a conceptual question about the presentation mechanism in the proposed circuit model. The present study refers to Wei, et al., 2015 and 2017 about the statistical model mechanism of the cardinal effect. If I remember correctly, Wei's papers considered joint encoding and decoding processes to render the cardinal effect. Can the authors regard the processes in the proposed circuit model with the stages in the statistical model? Or at least the authors should discuss this link in the Discussions.

      We now included a mention of using a population vector decoder that mimics Bayesian optimal readout in the Result section (p. 6), in addition to the Discussion and Methods. However, we acknowledge that this decoder is only optimal under a specific loss function. A recent Bayesian perception model suggested different types of noise like external noise or variations in loss functions that adjust tolerance to small errors may help explain various error patterns observed across different modalities (Hahn and Wei, 2024). We have now added this limitation in the Discussion, along with the inconsistency of the current model with experimental observations during perception tasks and future directions (p. 11).

    1. Author response:

      Reviewer #1 (Public Review):

      In this study, Girardello et al. use proteomics to reveal the membrane tension sensitive caveolin-1 interactome in migrating cells. The authors use EM and surface rendering to demonstrate that caveolae formed at the rear of migrating cells are complex membrane-linked multilobed structures, and they devise a robust strategy to identify caveolin-1 associated proteins using APEX2-mediated proximity biotinylation. This important dataset is further validated using proximity ligation assays to confirm key interactions, and follows up with an interrogation of a surprising relationship between caveolae and RhoGTPase signalling, where caveolin-1 recruits ROCK1 under high membrane tension conditions, and ROCK1 activity is required to reform caveolae upon reversion to isotonic solution. However, caveolin-1 recruits the RhoA inactivator ARHGAP29 when membrane tension is low and ARHGAP29 overexpression leads to disassembly of caveolae and reduced cell motility. This study builds on previous findings linking caveolae to positive feedback regulation of RhoA signalling, and provides further evidence that caveolae serve to drive rear retraction in migration but also possess an intrinsic brake to limit RhoA activation, leading the authors to suggest that cycles of caveolae assembly and disassembly could thereby be central to establish a stable cell rear for persistent cell migration

      A major strength of the manuscript is the robust proteomic dataset. The experimental set up is well defined and mostly well controlled, and there is good internal validation in that the high abundance of core caveolar proteins in low membrane tension (isotonic) conditions, and absence under high membrane tension (brief hypo-osmotic shock) conditions, correlating very well with previous finding. The data could however be better presented to show where statically robust changes occur, and supplementary information should include a table of showing abundance. It's very good to see a link to PRIDE, providing a useful resource for the community.

      We thank the reviewer for the positive feedback. We have included the outputs from the search engine in Supplementary File 1.

      The authors detail several known interactions and their mechanosensitivty, but also report new interactors of caveolin-1. Several mechanosensitive interactions of caveolin-1 take place at the cell rear, but others are more diffuse across the cell looking at the PLA data (e.g FLN1, CTTN, HSPB1; Figure 4A-F and Figure 4 supplement 1). It is interesting to speculate that those at the cell rear are involved in caveolae, whilst others are linked specifically to caveolin-1 (e.g. dolines). PLA or localisation analysis with Cavin1/PTRF may be able to resolve this and further specify caveolae versus non-caveolae mechanosensitive interactions.

      We thank the reviewer for this interesting idea. It is true that many if not most proteins we identified to be associated with Cav1 are not restricted to the cell rear. To analyse to what extent the identified proteins interact with Cav1 at the rear we reanalysed our PLA data for some of the antibody combinations we looked at. This new analysis is now shown in Fig 5G. As expected, for Cav1/PTRF and Cav1/EHD2 most PLA dots (70-80%) were found at the rear. This rear bias is also evident from the representative images we show in the Figure panels 5A and 5E. On the contrary, much fewer PLA dots (~40%) were rear-localised for Cav1/CTTN and Cav1/FLNA antibody combinations. This reflects the much broader cellular distribution of these proteins compared to the core caveolae proteins, and might suggest that there are generally few links between caveolae and cortical actin. However, it is also possible that such links/interactions are more difficult to detect using PLA (because of the extended distance between caveolae and the actin cortex, or because of steric constraints).

      The Cav1/ARHGAP29 influence on YAP signalling is interesting, but appear to be quite isolated from the rest of the manuscript. Does overexpression of ARHGAP29 influence YAP signalling and/or caveolar protein expression/Cav1pY14?

      Our data and published work originally prompted us to speculate that there is a potential functional link between Cav1, YAP, and ARHGAP29. In an attempt to address this we have performed several Western blots on cell lysates from cells overexpressing ARHGAP29. We did not see major changes in Cav1 Y14 phosphorylation levels in cells overexpressing ARHGAP29, and YAP and pYAP levels also remained unchanged (not shown). In addition, based on previous literature 1,2 we expected to see an effect on ARHGAP29 mRNA levels and YAP target gene transcripts in Cav1 siRNA transfected cells. To our surprise, the mRNA levels of three independent YAP target genes and ARHGAP29 were unchanged in Cav1 siRNA treated cells (this is now shown in Figure 6 Figure Supplement 1). Our data therefore suggest that in RPE1 cells, the connection between Cav1 and ARHGAP29 is independent of YAP signalling, and that the increase in ARHGAP29 protein levels observed in Cav1 siRNA cells is due to some unknown post-translational mechanism.

      ARHGAP29 and RhoA/ROCK1 related observations are very interesting and potentially really important. However, the link between ARHGAP29 and caveolae is not well established (other than in proteomic data). PLA or FRET could help establish this.

      We agree that the physical and functional link between caveolae (or Cav1) and ARHGAP29 was not well worked out in the original manuscript. In an attempt to address this we have performed PLA assays in GFP-ARHGAP29 transfected cells (as we did not find a suitable ARHGAP29 antibody that works reliably in IF) using anti-Cav1 and anti-GFP antibodies. The PLA signal we obtained for Cav1 and ARHGAP29 was not significantly different to control PLA experiments. There was very little PLA signal to start with. This is not surprising given that ARHGAP29 localisation is mostly diffuse in the cytoplasm, whilst Cav1 is concentrated at the rear. In addition, in cases where we do see ARHGAP29 localisation at the cell cortex, Cav1 tends to be absent (this is now shown in Figure 6 – Figure Supplement 2E). In other words, with the tools we have available, we see little colocalization between Cav1 and ARHGAP29 at steady state. Altogether we speculate that ARHGAP29, through its negative effect on RhoA, flattens caveolae at the membrane or interferes with caveolae assembly at these sites.

      This of course prompts the question why ARHGAP29 was identified in the Cav1 proteome with such specificity and reproducibility in the first place? This can be explained by the way APEX2 labeling works. Proximity biotinylation with APEX2 is extremely sensitive and restricted to a labelling radius of ~20 nm 3. The labeling reaction is conducted on live and intact cells at room temperature for 1 min. Although 1 min appears short, dynamic cellular processes occur at the time scale of seconds and are ongoing during the labelling reaction. It is conceivable that within this 1 min time frame, ARHGAP29 cycles on and off the rear membrane (kiss and run). This allows ARHGAP29 to be biotinylated by Cav1-APEX2, resulting in its identification by MS. We have included this in the discussion section.

      The relationship between ARHGAP29 and RhoA signalling is not well defined. Is GAP activity important in determining the effect on migration and caveolae formation? What is the effect on RhoA activity? Alternatively, the authors could investigate YAP dependent transcriptional regulation downstream of overexpression.

      We have addressed this point using overexpression and siRNA transfections. We overexpressed ARHGAP29 or ARHGAP29 lacking its GAP domain and performed WB analysis against pMLC (which is a commonly used and reliable readout for RhoA and myosin-II activity). Much to our surprise, overexpression of ARHGAP29 increased (rather than decreased) pMLC levels, partially in a GAP-dependent manner (see Author response image 1). This is puzzling, as ARHGAP29 is expected to reduce RhoA-GTP levels, which in turn is expected to reduce ROCK activity and hence pMLC levels. In addition, and also surprisingly, siRNA-mediated silencing of ARHGAP29 did not significantly change pMLC levels. By contrast, pMLC levels were strongly reduced in Cav1 siRNA treated cells (this is shown in Fig. 6A and 6B in the revised manuscript). These new data underscore the important role of caveolae in the control of myosin-II activity, but do not allow us to draw any firm conclusions about the role of ARHGAP29 at the cell rear.

      Author response image 1.

      Overexpression of ARHGAP29 reduces, rather than increases pMLC in RPE1 cells.

      We are uncertain as to how to interpret the ARHGAP29 overexpression data presented in Author response image 1 and therefore decided not to include it in the manuscript. One possibility is that inactivation of RhoA below a certain critical threshold causes other mechanisms to compensate. For instance, the activity of alternative MLC kinases such as MLCK could be enhanced under these conditions. Another possibility is that ARHGAP29 controls MLC phosphorylation indirectly. For instance, it has been shown that ARHGAP29 promotes actin destabilization through inactivating LIMK/cofilin signalling 1. In agreement with this, we find that overexpression of ARHGAP29 reduces p-cofilin (serine 3) levels (see Author response image 2). Since cofilin and MLC crosstalk 4, it is possible that increased pMLC levels are the result of a feedback loop that compensates for the effect of actin depolymerisation. This is now discussed in the discussion section. Whichever the case, we hope the reviewers understand that deeper mechanistic insight into the intricate mechanisms of Rho signalling at the cell rear are beyond the scope of this manuscript.

      Author response image 2.

      Overexpression of ARHGAP29 reduces p-cofilin levels in RPE1.

      Reviewer #2 (Public Review):

      Girardello et al investigated the composition of the molecular machinery of caveolae governing their mechano-regulation in migrating cells. Using live cell imaging and RPE1 cells, the authors provide a spatio-temporal analysis of cavin-3 distribution during cell migration and reveal that caveolae are preferentially localized at the rear of the cell in a stable manner. They further characterize these structures using electron tomography and reveal an organization into clusters connected to the cell surface. By performing a proteomic approach, they address the interactome of caveolin-1 proteins upon mechanical stimulation by exposing RPE1 cells to hypo-osmotic shock (which aims to increase cell membrane tension) or not as a control condition. The authors identify over 300 proteins, notably proteins related to actin cytoskeleton and cell adhesion. These results were further validated in cellulo by interrogating protein-protein interactions using proximity ligation assays and hypo-osmotic shock. These experiments confirmed previous data showing that high membrane tension induces caveolae disassembly in a reversible manner. Eventually, based on literature and on the results collected by the proteomic analysis, authors investigated more deeply the molecular signaling pathway controlling caveolae assembly upon mechanical stimuli. First, they confirm the targeting of ROCK1 with Caveolin-1 and the implication of the kinase activity for caveolae formation (at the rear of the cell). Then, they show that RhoGAP ARHGAP29, a factor newly identified by the proteomic analysis, is also implicated in caveolae mechano-regulation likely through YAP protein and found that overexpression of RhoGAP ARHGAP29 affects cell motility. Overall, this paper interrogated the role of membrane tension in caveolae located at the rear of the cell and identified a new pathway controlling cell motility.

      Strengths:

      Using a proximity-based proteomic assay, the authors reveal the protein network interacting with caveolae upon mechanical stimuli. This approach is elegant and allows to identify a substantial new set of factors involved in the mechano-regulation of caveolin-1, some of which have been verified directly in the cell by PLA. This study provides a compelling set of data on the interactions between caveolae and its cortical network which was so far ill-characterized.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The methodology demonstrating an impact of membrane tension is not precise enough to directly assess a direct role on caveolae at a subcellular scale, that is between the front and the rear of the cell. First, a better characterization of the "front-rear" cellular model is encouraged.

      We agree with the reviewer that a quantitative analysis of the caveolae front-rear polarity would strengthen our conclusions. To address this, we have analysed the localisation of Cav1 and cavins in detail and in a large pool of cells, both in fixed and live cells. Our quantification clearly shows that Cav1 and cavins are enriched at the cell rear. This is now shown in Figure 1 and Figure 1 - Figure Supplement 1. To demonstrate that Cav1/cavins are truly rear-localised we analysed live migrating cells expressing tagged Cav1 or cavins. This analysis, which was performed on several individual time lapse movies, showed that caveolae rear localisation is remarkably stable (e.g. Figure 1C and 1D). We also present novel data panels and movies showing caveolae dynamics during rear retractions, in dividing cells, and in cells that polarise de novo. This new data is now described in the first paragraph of the results section.

      Secondly, authors frequently present osmotic shock as "high membrane tension" stimuli. While osmotic shock is widely used in the field, this study is focused only on caveolae localized at the rear of cell and it remains unclear how the level of a global mechanical stimuli triggered by an osmotic shock could mimic a local stimuli.

      We agree with the reviewer that osmotic shock will cause a global increase in membrane tension and therefore is only of limited value to understand how membrane tension is regulated at the rear, and how caveolae respond to such a local stimulus. It was not our aim nor is it our expertise to address such questions. To answer this sophisticated optogenetic approaches or localised membrane tension measurements (e.g. through the use of the Flipper-TR probe) are needed. It is beyond the scope of this manuscript to perform such experiments. However, given the strong enrichment of caveolae at the cell rear, we believe it is justified to propose that the changes we observe in the proteome do (mostly) reflect changes in caveolae at the rear. We have now included several quantifications on fixed cells, live cells, and PLA assays to support that caveolae are highly enriched at the rear. In addition, and importantly, a recent preprint by the Roux lab shows that membrane tension gradients indeed exist in many migrating and non-migrating cells 5. Using very similar hypotonic shock assays, the Caswell lab also showed that low membrane tension at the rear is required for caveolae formation 6. We have included a section in the discussion in which we elaborate on how membrane tension is controlled in migrating cells, and how it might regulate caveolae rear localisation.

      In the present case, it remains unknown the extent to which this mechanical stress is physiologically relevant to mimic mechanical forces applied at the rear of a migrating cell.

      This is true. Our study does not address the nature of mechanical forces at the cell rear. This a complex subject that is technically challenging to address, and therefore is beyond the scope of this manuscript.

      Some images are not satisfying to fully support the conclusions of the article.

      We agree that some of the images, in particular the ones presented for the PLA assays, do not always show a clear rear localisation of caveolae. We have explained above why this is the case. We hope that our new quantitative measurements, movies and figure panels, addresses the reviewer’s concern.

      At this stage, the lack of an unbiased quantitative analysis of the spatio-temporal analysis of caveolae upon well-defined mechanical stimuli is also needed.

      These are all very good points that were previously addressed beautifully by the Caswell group 6. To address this in part in our RPE1 cell system, we imaged RPE1 cells exposed to the ROCK inhibitor Y27632 (see Author response image 3). The data shows that cell rear retraction is impeded in response to ROCK inhibition, which is in line with several previous reports. Cavin-1 remained mostly associated with the cell rear, although the distribution appeared more diffuse. We believe this data does not add much new insight into how caveolae function at the rear, and hence was not included in the manuscript.

      Author response image 3.

      Effect of ROCK inhibition on cavin1 rear localisation and rear retraction. Cells were imaged one hour after the addition of Y27632.

      Cells on images, in particular Figure 1, are difficult to see. Signal-to noise ratio in different cell area could generate a biased. Since there is inconsistency between caveolae density and localization between Figures, more solid illustrations are needed along quantitative analysis.

      As mentioned above, we have carefully analysed the localisation of caveolae in fixed cells (using Cav1 and cavin1 antibodies as well as Cav1 and cavin fusion proteins) and in live cells transfected with various different caveolae proteins. The analysis clearly demonstrates an enrichment of caveolae at the rear (Figure 1 and Figure 1 – Figure Supplement 1). Our tomography and TEM data supports this as well (Figure 2).

      References:

      1. Qiao Y, Chen J, Lim YB, et al. YAP Regulates Actin Dynamics through ARHGAP29 and Promotes Metastasis. Cell reports. 2017;19(8):1495-1502.

      2. Rausch V, Bostrom JR, Park J, et al. The Hippo Pathway Regulates Caveolae Expression and Mediates Flow Response via Caveolae. Curr Biol. 2019;29(2):242-255 e246.

      3. Hung V, Udeshi ND, Lam SS, et al. Spatially resolved proteomic mapping in living cells with the engineered peroxidase APEX2. Nat Protoc. 2016;11(3):456-475.

      4. Wiggan O, Shaw AE, DeLuca JG, Bamburg JR. ADF/cofilin regulates actomyosin assembly through competitive inhibition of myosin II binding to F-actin. Dev Cell. 2012;22(3):530-543.

      5. Juan Manuel García-Arcos AM, Julissa Sánchez Velázquez, Pau Guillamat, Caterina Tomba, Laura Houzet, Laura Capolupo, Giovanni D’Angelo, Adai Colom, Elizabeth Hinde, Charlotte Aumeier, Aurélien Roux. Actin dynamics sustains spatial gradients of membrane tension in adherent cells. bioRxiv 20240715603517. 2024.

      6. Hetmanski JHR, de Belly H, Busnelli I, et al. Membrane Tension Orchestrates Rear Retraction in Matrix-Directed Cell Migration. Dev Cell. 2019;51(4):460-475 e410.

      7. Tsai TY, Collins SR, Chan CK, et al. Efficient Front-Rear Coupling in Neutrophil Chemotaxis by Dynamic Myosin II Localization. Dev Cell. 2019;49(2):189-205 e186.

      8. Mueller J, Szep G, Nemethova M, et al. Load Adaptation of Lamellipodial Actin Networks. Cell. 2017;171(1):188-200 e116.

      9. De Belly H, Yan S, Borja da Rocha H, et al. Cell protrusions and contractions generate long-range membrane tension propagation. Cell. 2023.

      10. Matthaeus C, Sochacki KA, Dickey AM, et al. The molecular organization of differentially curved caveolae indicates bendable structural units at the plasma membrane. Nat Commun. 2022;13(1):7234.

      11. Sinha B, Koster D, Ruez R, et al. Cells respond to mechanical stress by rapid disassembly of caveolae. Cell. 2011;144(3):402-413.

      12. Lieber AD, Schweitzer Y, Kozlov MM, Keren K. Front-to-rear membrane tension gradient in rapidly moving cells. Biophysical journal. 2015;108(7):1599-1603.

      13. Shi Z, Graber ZT, Baumgart T, Stone HA, Cohen AE. Cell Membranes Resist Flow. Cell. 2018;175(7):1769-1779 e1713.

      14. Grande-Garcia A, Echarri A, de Rooij J, et al. Caveolin-1 regulates cell polarization and directional migration through Src kinase and Rho GTPases. The Journal of cell biology. 2007;177(4):683-694.

      15. Grande-Garcia A, del Pozo MA. Caveolin-1 in cell polarization and directional migration. Eur J Cell Biol. 2008;87(8-9):641-647.

      16. Ludwig A, Howard G, Mendoza-Topaz C, et al. Molecular composition and ultrastructure of the caveolar coat complex. PLoS biology. 2013;11(8):e1001640.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study details an enrichment of the IL-6 signaling pathway in human tendinopathy and applies transcriptional profiling to an advanced in vitro model to test IL-6 specific phenotypes in tendinopathy. Overall, the strength of evidence is solid yet incomplete, as transcriptomic measurements provide clarity, though functional studies including analysis of proliferation are needed to confirm these findings. This work will be of interest to stem cell biologists and immunologists.

      To functionally assess the effect of IL-6 on Scx+ fibroblast proliferation in an acute injury, we repeated the in vivo studies with an EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line. We found no evidence for this effect in acute injuries and acknowledge this in the revised manuscript.

      We further added data collected by combining fluorescence microscopy with human patient-derived tissue to strengthen the link between IL-6, IL-6R, and proliferation of CD90+ cells in chronic injuries.

      See comment 1.1.

      See comment 2.4.

      Changes:

      - Title

      - Abstract

      - Figure 2 and 3 (new data)

      - Figure 7 (new data)

      - Results

      - Discussion

      Reviewer 1

      (1.1) First, the experimental approach does not directly assess proliferation, as such the conclusions regarding proliferation are not well supported. In the ex-vivo model, the use of cell counting approaches is somewhat acceptable since the system is constrained by the absence of potential influx of new cells. However, given the nearly unlimited supply of extrinsically derived cells in vivo (vs. the explant model), assessment of actual proliferation (e.g. Edu, BrdU, Ki67) is critical to support this conclusion.

      To assess the effect of IL-6 on Scx+ fibroblast proliferation in an acute injury, we repeated the in vivo studies with an EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line to combat the considerable background noise of currently available Scx antibodies.

      Under the improved design of these experiments, we could detect no effect of IL-6 on ScxGFP+ cells in an acute injury in vivo. We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9).

      We acknowledge and discuss this in the discussion section.

      See comment 2.4.

      See comment 2.11.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (1.2) Second, the justification for the use of Scx-GFP+ cells as a progenitor population is not well supported. Indeed, in the discussion, Scx+ cells are treated as though they are uniformly a progenitor population, when the diversity of this population has been established by the cited studies, which do not suggest that these are progenitor populations. Additional definition/ delineation of these cells to identify the subset of these cells that may actually display other putative progenitor markers would support the conclusions. As it stands, the study currently provides important information on the impact of IL6 on Scx+ cells, but not tendon progenitors.

      We further delineated the extrinsic cell populations isolated from mouse Achilles tendons of ScxGFP+ mice using flow cytometric analysis and RT-qPCR. We used tendon population markers suggested by sc-RNA-seq of mouse Achilles tendons.

      (De Micheli et al., Am. J. Physiol. - Cell Physiol., 2020, 319(5), DOI: 10.1152/ajpcell.00372.2020)

      While a small subpopulation of these cells expressed typical progenitor markers (i.e. CD45 and CD146), we could detect no overlap with Scx+ cells. As suggested by the reviewer, we therefore replaced occurrences of “progenitor” in the manuscript with “fibroblast” and performed additional experiments with human patient-derived tissue sections and the fibroblast marker CD90.

      See comment 2.1.

      Changes:

      - Title

      - Abstract

      - Figure 2 (new data)

      - Figure 3 (new data)

      - Supplementary Figure 6 (new data)

      - Results

      - Discussion

      (1.3) Clarity regarding the relevance of the 'sheath-like' component of the assembloid would provide helpful context regarding which types of tendons are likely to have this type of communication vs. those that do not, and if there are differences in tendinopathy prevalence. Understanding why/how this communication between structures is relevant is important.

      Our assembloid concept is inspired by the structure of unsheathed tendons (i.e. biceps, semitendinosus, gracilis) and not sheathed tendons like the flexor tendons.

      We agree that clarity regarding the tendon type having this type of communication is important, so we sharpened previously blurry text passages in the revised manuscript.

      Text changes:

      - Introduction, page 3

      - Results, page 4

      - Results, page 8

      - Results, page 9

      - Results, page 11

      - Discussion, page 25

      - Discussion, page 26

      - Experimental section, page 28

      - Figure 1

      - Figure 2

      - Figure 3

      - Supplementary Table 1

      - Supplementary Figure 3

      - Supplementary Figure 4

      (1.4) Minor: in the text for Figure 6 (2nd paragraph), the comma in 19,694 is superscripted.

      Corrections were made throughout the manuscript.

      Text changes:

      - Results, page 4

      - Results, page 12

      - Results, page 19

      - Results, page 21

      (1.5) Minor: The inclusion of the Scx-GFP mouse should be included in the schematic Figure 5.

      The results presented in the previous draft did not feature tissues from ScxGFP mice but used a Scx-antibody to visually detect Scx+ cells. In anticipation of the revision process, we bred a new IL-6 KO x ScxGFP+ mouse line and repeated the experiment. As suggested by the reviewer, the new schematic figure 7 as well as the former figure 5 moved to the supplementary material now includes this mouse.

      Figure changes:

      - Supplementary Figure 9 (former figure 5)

      - Figure 7

      Reviewer 2

      (2.1) One question that comes to mind is whether the fibroblast progenitors in the extrinsic sheath of Achilles tendon is similar to those surrounding the tail tendon. The similarity of progenitors between different tendons is assumed with this model. I would consider this to be a minor issue.

      Tail tendon fascicles are thought to have a low number of reparative fibroblasts / progenitor cells because they lack a developed extrinsic compartment. Achilles tendons are supposed to have a higher number of reparative fibroblasts / progenitor cells, as their fascicles are surrounded by an extrinsic compartment.

      To verify this here, we added a better characterization and comparison of the cell populations isolated from the tail tendon fascicles and the Achilles tendons.

      First, we added representative light microscopy images of these cells at different timepoints after being cultured on tissue-culture plastic.

      Second, we performed flow cytometric analysis not only on the freshly digested tail tendon fascicles and Achilles tendons, but also on the cultured cells at the timepoint when they would have been embedded into the assembloids.

      Third, we compared the expression of population-specific markers in cells derived from tail tendon fascicle and Achilles tendons.

      As expected, tail tendon fascicle-derived cell populations appeared to be more elongated than Achilles tendon-derived populations shortly after isolation. Similarly, the “maintenance” fibroblasts in healthy tendons are more elongated than the reparative fibroblasts in diseased ones. After culture and priming in tendinopathic niche conditions, both populations assumed a more roundish, reparative phenotype.

      This was consistent with the flow cytometric analysis, which revealed a large difference between freshly isolated populations, that disappeared after extended culture and priming in tendinopathic niche conditions. Gene expression in tail tendon fascicle-derived and Achilles tendon-derived cells was similar after extended culture and priming in tendinopathic niche conditions.

      See comment 1.2.

      See comment 2.10.

      Changes:

      - Supplementary Figure 6 (new data)

      - Results, page 11

      (2.2) The authors use core tendons from IL-6 knockout mice and progenitors from wild-type mice. The reasoning behind this approach was a little confusing... is IL-6 expressed solely in the tendon core compared to the extrinsic sheath?

      Insights gained from human patient-derived tissues (Figure 2) suggest that in a healthy tendon, most of the IL-6 is located in the extrinsic compartment but distributed over compartments in the tendinopathic ones.

      Our assembloid design mimicks this by embedding wildtype fibroblasts into the extrinsic compartment. Our hypothesis was that a wildtype core in tendinopathic niche conditions attracts reparative fibroblasts through IL-6, while an IL-6 knock-out core does not. Therefore, it was important to establish IL-6 gradients close to what they seem to be in vivo.

      Nevertheless, we have to acknowledge that the amount of IL-6 secreted by extrinsic fibroblasts in isolation is quite small compared to what is secreted by a wildtype core (Supplementary Figure 7). Attributing IL-6 in the supernatant of a WT core // WT fibroblast assembloid to the correct cell population is challenging but could be part of future research.  

      Changes:

      - Figure 2 (new data)

      - Supplementary Figure 7 (new data)

      - Results, page 12

      (2.3) Is a co-culture system for 7 days appropriate to model tendinopathy without the supplementation of exogenous inflammatory compounds? The transcriptomic differences in Figure 3 seem to be subtle, and may perhaps suggest that it could be a model that more closely resembles steady state compared to tendinopathy. If so, is IL-6 still relevant during steady state?

      The collective experience in our lab is that core explants exposed to tendinopathic niche conditions (i.e. serum, 37°C, high oxygen, and high glucose levels) assume a disease-like phenotype. (i.e. Wunderli et al., Matrix Biology, 2020, Volume 89 https://doi.org/10.1016/j.matbio.2019.12.003 and Blache et al., Sci. Rep., 2021, 11(1), DOI 10.1038/s41598-021-85331-1).

      Specifically for our core // fibroblast co-culture system, we have reported the emergence of exaggerated tendinopathic hallmarks in a previous publication (Stauber et al., Adv. Healthc. Mater., 2021, 10(20), https://doi.org/10.1002/adhm.202100741).

      We clarified the use of previously validated tendinopathic niche conditions in this manuscript.

      Changes:<br /> - Introduction, page 3<br /> - Results, page 12

      (2.4) The results presented in Figures 4 and 5 are impressive, demonstrating a link between IL-6 and fibroblast progenitor numbers and migration. Their experimental design in these figures show strong evidence, using Tocilizumab and recombinant IL-6 to rescue shown phenotypes. I would reduce the claims on proliferation, however, unless a proliferation-specific marker (e.g., Ki67, BrdU, EdU) is included in confocal analyses of Scx+ progenitors.

      As reviewer 1 pointed out as well, it is important to use a proliferation-specific marker “given the nearly unlimited supply of extrinsically derived cells in vivo (vs. the explant model)”.

      To assess the effect of IL-6 on Scx+ fibroblast proliferation in vivo, we repeated those experiments with a proliferation-specific EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line.

      Under this improved design, we could not detect an effect of IL-6 on proliferation in an acute injury in vivo.

      We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9).

      We acknowledge and discuss this in the discussion section and softened our statements in the title and the abstract.

      See comment 1.1.

      See comment 2.11.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (2.5) I think it would significantly strengthen the study if they could measure tendon healing in IL-6 knockouts or in wild-type mice treated with IL-6 inhibitors, since conventional ablation of IL-6 may lead to the elevation of compensatory IL-6 superfamily ligands that could activate STAT signaling. The authors claim that reducing IL-6 signaling decreases transcriptomic signatures of tendinopathy, but IL-6 may be necessary to promote normal healing of the tendon following injury. It is supposed that a lack of Scx+ progenitor migration would delay tendon healing.

      Indeed, another study using the same IL-6 knock-out strain showed that a lack of IL-6 signaling resulted in slightly inferior mechanical properties in healing patellar tendons (Lin et al., J. Biomech., 39(1), 2006 https://doi.org/10.1016/j.jbiomech.2004.11.009)

      Also, it might be due to the elevation of compensatory IL-6 superfamily ligands that we found no effect of IL-6 on the proliferation of Scx+ cells in an acute injury in vivo.

      Therefore, assessing the effects of IL-6 inhibitors on tendon healing following an acute injury would have been of great interest to us. Unfortunately, getting the necessary permission from the animal experimentation office for a new invasive treatment protocol was outside of our scope due to the severity degree and time limitations.

      We incorporated and acknowledged these important points in the discussion.

      Text changes:

      - Introduction, page 3

      - Discussion, page 26

      (2.6) Do IL-6 knockout mice and/or mice treated with IL-6 inhibitors have delayed healing following Achilles tendon resection? Please provide experimental evidence.

      See comment 2.5.

      (2.7) I would suggest reducing claims on proliferation, or include a proliferation specific marker (e.g., Ki67, BrdU, EdU) in confocal analyses of Scx+ progenitors.

      See comment 1.1.

      See comment 2.4.

      (2.8) Supplementary Figures 1 and 2: the authors removed outliers. Please specify exactly which outliers were removed in the figures, and provide additional information on the criteria used to identify these outliers.

      To address this comment, we sharpened our criteria for identifying outliers and re-did the analysis depicted in figure 1.

      Briefly, we excluded 5 normal and 5 tendinopathic samples from sheathed tendons which have a different compartmental structure than unsheathed tendons.

      A complete separate analysis of the sheathed tendons would have been beyond the scope of this manuscript, but early screening suggested that IL-6 transcripts are not increased in sheathed tendinopathic tendons.

      We made text changes throughout the manuscript and to the supplementary table 1 and supplementary figure 2 to clearly state our criteria for excluding samples / outliers.

      Changes:

      - Introduction, page 3

      - Results, page 4

      - Results, page 8

      - Results, page 9

      - Results, page 11

      - Discussion, page 25

      - Discussion, page 26

      - Experimental section, page 28

      - Figure 1,

      - Figure 2,

      - Figure 3,

      - Supplementary table 1,

      - Supplementary figure 2,

      - Supplementary figure 3,

      - Supplementary figure 4,

      (2.9) Whenever "positive enrichment" is mentioned in the text, please specify in what group. It is presumed that the enrichment, for example, in the first figure is associated with tendinopathy samples compared to controls, though it is a bit unclear.

      The direction of the enrichment was added to the text.

      Text changes:

      - Abstract, page 1

      - Introduction, page 3

      - Results, page 4

      - Results, page 6

      - Results, page 12

      - Results, page 14

      - Results, page 19

      - Results, page 21

      - Discussion, page 25

      - Discussion, page 26

      - Discussion, page 27

      - Figure 1

      - Figure 5

      - Figure 8

      - Figure 9

      - Supplementary figure 3

      - Supplementary figure 4

      - Supplementary figure 6

      - Supplementary figure 8

      - Supplementary figure 11

      - Supplementary figure 12

      - Supplementary figure 14

      (2.10) Are tail tendon progenitors similar to Achilles tendon progenitors? Please provide a statement that shows similarity (in function, transcriptome, etc.) to support the in vitro tendon model.

      See comment 1.2.

      See comment 2.1.

      (2.11) Are the results in Figure 5F significant? It seems that your pictures show a dramatic change in migration, but the quantification does not?

      We repeated the in vivo studies with a newly established IL-6 KO x ScxGFP+ mouse line to combat the considerable background noise of currently available Scx antibodies.

      Under the improved design of these experiments, we could not detect an effect of IL-6 on ScxGFP+ cells migration in an acute injury in vivo.

      We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9)

      We acknowledge and discuss this in the discussion section.

      See comment 1.1.

      See comment 2.4.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (2.12) Please provide additional discussion points on cis- versus trans-IL6 signaling in your results found in mouse. Do you think researchers/clinicians would want to target trans-IL6 signaling based on your results? Please support these statements with the expression of IL6R on cells found in the tendon core and external sheath progenitors.

      To address this comment, we performed flow cytometric analysis on Achilles tendon-derived fibroblasts expanded in 2D and digested sub-compartments of the assembloids (Supplementary Figure 7).

      These data suggest that IL6R is neither expressed by core nor extrinsic fibroblasts, but mainly comes from core-resident CD45+ tenophages.

      Human samples co-stained for IL6R and CD68 (an established human macrophage marker) confirmed macrophages as a source of IL-6R in vivo. However, human samples co-stained for IL6R and CD90 (an established marker of reparative fibroblasts in humans) also detected IL6R on CD90+ cells, which have not yet been reported to express IL6R themselves.

      Overall, it is likely that trans-IL-6 signaling is more important for the activation of reparative fibroblasts than cis-IL-6 signaling. We added these statements to the manuscript.

      Changes:

      - Results, page 9

      - Results, page 12

      - Discussion, page 25

      - Discussion, page 26

      - Figure 3 (new data)

      - Supplementary figure 7 (new data)

      (2.13) Please provide more detail on collagen isolation from rat tail in the methods section.

      We provided more details on collagen isolation from rat tail in the experimental section (page 29)

      Changes:

      - Experimental section, page 29

      (2.14) Please comment on whether your in vitro system resembles tendinopathy or a steady state tendon. If it models more of a steady state system, would IL-6 still be relevant?

      See comment 2.3.

      Detailed feedback:

      Reviewer 1:

      This work by Stauber et al. is focused on understanding the signaling mechanisms that are associated with tendinopathy development, and by screening a panel of human tendinopathy samples, identified IL-6/JAK/STAT as a potential mediator of this pathology. Using an innovative explant model they delineated the requirement for IL-6 in the main body of the tendon to alter the dynamics of cells in the peritendinous synovial sheath space.

      The use of a publicly available existing dataset is considered a strength since this dataset includes expression data from several different human tendons experiencing tendinopathy. This facilitates the identification of potentially conserved regulators of the tendinopathy phenotype.

      The clear transcriptional shifts between WT and IL6-/- cores demonstrates the utility of the assembloid model, and supports the importance of IL6 in potentiating the cell response to this stimuli.

      Reviewer 2:

      The authors of this study describe a goal of elucidating the signaling pathways that are upregulated in tendinopathy in order to target these pathways for effective treatments. Their goal is honorable, as tendinopathy is a common debilitating condition with limited treatments. The authors find that IL-6 signaling is upregulated in human tendinopathy samples with transcriptomic and GSEA analyses. The evidence of their initial findings are strong, providing a clinically-relevant phenotype that can be further studied using animal models.

      Along these lines, the authors continue with an advanced in vitro system using the mouse tail tendon as the core with progenitors isolated from the Achilles tendon as the external sheath embedded in a hydrogel matrix. One question that comes to mind is whether the fibroblast progenitors in the extrinsic sheath of Achilles tendon is similar to those surrounding the tail tendon. The similarity of progenitors between different tendons is assumed with this model. I would consider this to be a minor issue, and would consider the in vitro system to be an additional strength of this study.

      In order to address the IL-6 signaling pathway, the authors use core tendons from IL-6 knockout mice and progenitors from wild-type mice. The reasoning behind this approach was a little confusing... is IL-6 expressed solely in the tendon core compared to the extrinsic sheath? Furthermore, is a co-culture system for 7 days appropriate to model tendinopathy without the supplementation of exogenous inflammatory compounds? The transcriptomic differences in Figure 3 seem to be subtle, and may perhaps suggest that it could be a model that more closely resembles steady state compared to tendinopathy. If so, is IL-6 still relevant during steady state?

      Nevertheless, the results presented in Figures 4 and 5 are impressive, demonstrating a link between IL-6 and fibroblast progenitor numbers and migration. Their experimental design in these figures show strong evidence, using Tocilizumab and recombinant IL-6 to rescue shown phenotypes. I would reduce the claims on proliferation, however, unless a proliferation-specific marker (e.g., Ki67, BrdU, EdU) is included in confocal analyses of Scx+ progenitors. The Achilles tendon injury model provides a nice in vivo confirmation of Scx-progenitor migration to the neotendon.

      Given their goal to elucidate signaling pathways that could be targeted in the clinic, I think it would significantly strengthen the study if they could measure tendon healing in IL-6 knockouts or in wild-type mice treated with IL-6 inhibitors, since conventional ablation of IL-6 may lead to the elevation of compensatory IL-6 superfamily ligands that could activate STAT signaling. The authors claim that reducing IL-6 signaling decreases transcriptomic signatures of tendinopathy, but IL-6 may be necessary to promote normal healing of the tendon following injury. It is supposed that a lack of Scx+ progenitor migration would delay tendon healing.

      Overall, the authors of this study elucidated IL-6 signaling in tendinopathy and provided a strong level of evidence to support their conclusions at the transcriptomic level. However, functional studies are needed to confirm these phenotypes and fully support their aims and conclusions. With these additional studies, this work has the potential to significantly influence treatments for those suffering from tendinopathy.

    1. Author response:

      (1) First, we wish to point out that there has not been a model for quantifying genetic drift in multi-copy gene systems.  Hence, the first attempt using the Haldane model is not expected to be familiar and readily acceptable. Nevertheless, the standard WF (Wright-Fisher) model cannot handle drift in multi-copy gene systems, such as viruses, due to the two levels of genetic drift – within individuals as well as between individuals of the population.

      [Point 1 responds to the comments that we did not engage with the literature, in particular, publications like the Canning model, which are extensions of the WF model. As pointed out above, models based on the WF sampling cannot handle the two levels of genetic drift.]

      (2) A crucial aspect of the study is the nature of rRNA gene cluster, which is also a multi-copy gene system. It is easy to see some multi-copy gene systems, like viral particles or mtDNAs, to have a sub-population of genes within each individual. It is less obvious that tandem arrays of gene copies like rRNA genes can be treated as sub-populations that are subjected to drift. Nevertheless, rRNA gene copies frequently transfer mutations among copies in the same cell via the homogenization process. Hence, rRNA genes do not have the property of "locus" of single-copy genes as they move about as well (a bit like transposons but via different mechanisms). Indeed, the collection of rRNA genes in a cell is referred to as the “community of genes” as cited in Fig. 1. Over hundreds of generations, rRNA genes are effectively a small gene pool like mtDNAs within cells.  Furthermore, the copy number of rRNA genes also changes rapidly among individuals. For these reasons, genetic drift is operative within cells and this study aims to determine its strength (see Response 3 below).

      [Point 2 of the response addresses questions of Review #1 such as "(whether) the authors are referring to diversity in a single copy of an rRNA gene (or) diversity across the entire array of rRNA genes" or "(whether) the discussion of heterozygosity at rRNA ... is diversity per single copy locus or after collapsing loci together". The answer should be "the genetic diversity of the population of rRNA genes in the cell", noting that the single gene locus does not apply here. Similarly, a question like "Alignment to a single reference genome would likely lead to incorrect and even failed alignment for some reads'" from Review #2 appears to be based on the homology concept of a rRNA gene locus.  All rRNA gene copies are aligned against the consensus of the population of genes of the species. The consensus nucleotide nearly always accounts for > 90% of the gene copies in the population.]

      (3) We now clarify the meaning of C*, the effective copy number of rRNA genes. We apologize that the abstract is indeed unclear, and even misleading. In the abstract, we did not use different notations for the actual copy number (C) and the effective copy number (C*) of rRNA genes. Instead, we use the letter C to designate both.  Furthermore, in the main text, the presentation of the effective number, C*, is overly complicated (in order to be realistic).  We apologize. Slight modifications of the abstract should have removed all the mis-understandings, as shown below.

      "On average, rDNAs have C ~ 150 - 300 copies per haploid in humans. While a neutral mutation of a single-copy gene would take 4N (N being the population size) generations to become fixed, the time should be 4NC generations for rRNA genes where 1<< C (C being the effective copy number; C > C or C <C will depend on the strength of drift). However, the observed fixation time in mouse and human is < 4N, implying the paradox of C < 1. Genetic drift that encompasses all random neutral evolutionary forces appears as much as 100 times stronger for rRNA genes as for single-copy genes, thus reducing C* to < 1."

      [Point 3 responds to the key criticisms.  From Review #1 " The authors frame the number of rRNA genes as roughly equivalent to expanding the population size, ... a mutation can spread among rRNA gene copies is fundamentally different   …". Indeed, the abstract can be very misleading when it uses CN interchangeably with C*N, essentially by allowing C to mean both. 

      From Review #2 "In Eq (1), although C is defined as the "effective copy number", it is unclear what it means in an empirical sense…".  From the slightly revised text quoted above, it should be clear that the fixation time as well as the level of polymorphism represent the empirical measures of C".

      (4) Lastly, we shall address the mis-understood "reproductive success" of rRNA genes, which is the number of progeny, K, in the Haldane model. K should be more accurately referred to as the transmission speed. For single-copy genes, reproductive success and transmission both mean the same thing, K. But the term reproductive success is not appropriate for rRNA genes even though the formulae for K are the same for all gene systems

      [Point 4 responds to all criticisms using the term "reproductive success"]

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Yun et al. examined the molecular and neuronal underpinnings of changes in Drosophila female reproductive behaviors in response to social cues. Specifically, the authors measure the ejaculate-holding period, which is the amount of time females retain male ejaculate after mating (typically 90 min in flies). They find that female fruit flies, Drosophila melanogaster, display shorter holding periods in the presence of a native male or male-associated cues, including 2-Methyltetracosane (2MC) and 7-Tricosene (7-T). They further show that 2MC functions through Or47b olfactory receptor neurons (ORNs) and the Or47b channel, while 7-T functions through ppk23 expressing neurons. Interestingly, their data also indicates that two other olfactory ligands for Or47b (methyl laurate and palmitoleic acid) do not have the same effects on the ejaculate-holding period. By performing a series of behavioral and imaging experiments, the authors reveal that an increase in cAMP activity in pC1 neurons is required for this shortening of the ejaculate-holding period and may be involved in the likelihood of remating. This work lays the foundation for future studies on sexual plasticity in female Drosophila.

      The conclusions of this paper are mostly supported by the data, but aspects of the lines used for individual pC1 subtypes and visual contributions as well as the statistical analysis need to be clarified.

      (1) The pC1 subtypes (a - e) are delineated based on their morphology and connectivity. While the morphology of these neurons is distinct, they do share a resemblance that can be difficult to discern depending on the imaging performed. Additionally, genetic lines attempting to label individual neurons can easily be contaminated by low-level expression in off-target neurons in the brain or ventral nerve cord (VNC), which could contribute to behavioral changes following optogenetic manipulations. In Figures 5C - D, the authors generated and used new lines for labeling pC1a and pC1b+c. The line for pC1b+c was imaged as part of another recent study (https://doi.org/10.1073/pnas.2310841121). However, similar additional images of the pC1a line (i.e. 40x magnification and VNC expression) would be helpful in order to validate its specificity.

      We have included the high-resolution images of the expression of the pC1a-split-Gal4 driver in the brain and the VNC in the new figures S6A and S6B.

      (2) The author's experiments examining olfactory and gustatory contributions to the holding period were well controlled and described. However, the experiments in Figure 1D examining visual contributions were not sufficiently convincing as the line used (w1118) has previously been shown to be visually impaired (Wehner et al., 1969; Kalmus 1948). Using another wild-type line would have improved the authors' claims.

      It is evident that w1118 flies are visually impaired and are able to receive a limited amount of visual information in dim red light. Nevertheless, they are able to exhibit MIES phenotypes, which further supports the dispensability of visual information in MIES. In a 2024 study, Doubovetzky et al. (1) found that MIES in ninaB mutant females, which have defects in visual sensation, was not altered. This further corroborates our assertion that vision is likely to be of lesser importance than olfaction in MIES.

      (3) When comparisons between more than 2 groups are shown as in Figures 1E, 3D, and 5E, the comparisons being made were not clear. Adding in the results of a nonparametric multiple comparisons test would help for the interpretation of these results.

      We have revised figures 1E, 3D, 5E and the accompanying legends as suggested.

      Reviewer #2 (Public Review):

      The work by Yun et al. explores an important question related to post-copulatory sexual selection and sperm competition: Can females actively influence the outcome of insemination by a particular male by modulating the storage and ejection of transferred sperm in response to contextual sensory stimuli? The present work is exemplary for how the Drosophila model can give detailed insight into the basic mechanism of sexual plasticity, addressing the underlying neuronal circuits on a genetic, molecular, and cellular level.

      Using the Drosophila model, the authors show that the presence of other males or mated females after mating shortens the ejaculate-holding period (EHP) of a female, i.e. the time she takes until she ejects the mating plug and unstored sperm. Through a series of thorough and systematic experiments involving the manipulation of olfactory and chemo-gustatory neurons and genes in combination with exposure to defined pheromones, they uncover two pheromones and their sensory cells for this behavior. Exposure to the male-specific pheromone 2MC shortens EHP via female Or47b olfactory neurons, and the contact pheromone 7-T, present in males and on mated females, does so via ppk23 expressing gustatory foreleg neurons. Both compounds increase cAMP levels in a specific subset of central brain receptivity circuit neurons, the pC1b,c neurons. By employing an optogenetically controlled adenyl cyclase, the authors show that increased cAMP levels in pC1b and c neurons increase their excitability upon male pheromone exposure, decrease female EHP, and increase the remating rate. This provides convincing evidence for the role of pC1b,c neurons in integrating information about the social environment and mediating not only virgin but also mated female post-copulatory mate choice.

      Understanding context and state-dependent sexual behavior is of fundamental interest. Mate behavior is highly context-dependent. In animals subjected to sperm competition, the complexities of optimal mate choice have attracted a long history of sophisticated modelling in the framework of game theory. These models are in stark contrast to how little we understand so far about the biological and neurophysiological mechanisms of how females implement post-copulatory or so-called "cryptic" mate choice and bias sperm usage when mating multiple times.

      The strength of the paper is decrypting "cryptic" mate choice, i.e. the clear identification of physiological mechanisms and proximal causes for female post-copulatory mate choice. The discovery of peripheral chemosensory nodes and neurophysiological mechanisms in central circuit nodes will provide a fruitful starting point to fully map the circuits for female receptivity and mate choice during the whole gamut of female life history.

      We appreciate the positive response to our work.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      While appreciating the quality of the work the reviewers had a few key concerns that would greatly improve the manuscript. These are:

      (1) In some cases the specific statistical analyses are not clear. Could the authors please clarify what comparisons were made and the specific tests used?

      We have clarified the comparisons made in the multiple comparison analysis and specified the tests used in figures 1E, 3D, 5E.

      (2) Could the authors please include data that verify the expression patterns of their new reagent for pC1a, which will be useful for the community?

      Figure S6 was revised to include the expression of the pC1a-split-Gal4 gene in the brain (Fig. S6A) and the VNC (Fig. S6B).

      (3) A figure summarising their findings in the context of known circuitry will be useful.

      A new Figure 7 has been prepared, which provides a summary of our findings.

      (4) The SAG data are interesting. Do the authors wish to consider moving it to the main text or removing it if too preliminary?

      The supplementary figure 10 and related discussions in the discussion section have been removed.

      In the revised version of this manuscript, we present new evidence that the Or47b gene is required for 2MC-induced cAMP elevation in pC1 neurons, but not for 7T-induced one (see Fig. 5F). This observation supports that Or47b is a receptor for 2MC.

      The following paragraph was inserted at line 248 to provide a detailed description of the new findings: "To further test the role of Or47b in 2MC detection, we generated Or47b-deficient females with pC1 neurons expressing the CRE-luciferase reporter. Females with one copy of the wild-type Or47b allele, which served as the control group, showed robust CRE-luciferase reporter activity in response to either 2MC or 7-T. In contrast, Or47b-deficient females showed robust CRE-luciferase activity in response to to 7-T, but little activity in response to 2MC. This observation suggests that the odorant receptor Or47b plays an essential role in the selective detection of 2MC (Fig. 5F).”

      In addition, the following sentence was inserted at line 308 in the discussion section: “In this study, we provide compelling evidence that 2MC induces cAMP elevation in pC1 neurons and EHP shortening via both the Or47b receptor and Or47b ORNs, suggesting that 2MC functions as an odorant ligand for Or47b.”

      Relative CRE-luciferase reporter activity of pC1 neurons in females of the indicated genotypes, incubated with a piece of filter paper perfumed with solvent vehicle control or the indicated pheromones immediately after mating. The CRE-luciferase reporter activity of pC1 neurons of Or47b-deficient females (Or47b2/2 or Or47b3/3) was observed to increase in response to 7-T but not to 2MC. To calculate the relative luciferase activity, the average luminescence unit values of the female incubated with the vehicle are set to 100%. Mann-Whitney Test (n.s. p > 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001). Gray circles indicate the relative luciferase activity (%) of individual females, and the mean ± SEM of data is presented.

      Reviewer #1 (Recommendations For The Authors):

      (1) There was a discrepancy between the text and the figures. Based on the asterisks above the data in Figure S5A, the data supports only 150 ng of 7-T shortening the ejaculation holding period. However, the text states that (line 190) "150 or 375 ng of 7-T significantly shortened EHP." It would be helpful if the authors clarified this discrepancy.

      The sentence has been revised and now reads as follows: ‘150 ng of 7-T significantly shortened EHP’.

      (2) Based on the current organization of the text, it was not clear how 2MC was identified and its concentrations were known to be physiologically relevant. It would be helpful if the authors could expand on this in lines 178 - 179.

      The following sentences were inserted into the revised version of the manuscript at line 178: The EHP was therefore measured in females incubated in a small mating chamber containing a piece of filter paper perfumed with male CHCs, including 2-methylhexacosane, 2-methyldocosane, 5-methyltricosane, 7-methyltricosane, 10Z-heneicosene, 9Z-heneicosene, and 2MC at various concentrations (not shown). Among these, 2MC at 750 ng was the only one that significantly reduced EHP (Fig. 3A; Fig. S4). 2MC was mainly found in males, but not in virgin females (30). Notably, it is present in D. melanogaster, D. simulans, D. sechellia, and D. erecta, but not in D. yakuba (30, 60).

      (3) The inset pie chart image illustrating MIES in Figure 1A was difficult to interpret. It would be helpful if the authors used a different method for representing this (i.e. a timeline).

      Figure 1A was revised as suggested.

      (4) In lines 121 - 122, the authors state that the females are exposed to "actively courting naive wild type Canton S males." This was difficult to understand and might be improved by removing "actively courting."

      Revised as suggested.

      Reviewer #2 (Recommendations For The Authors):

      (1) Summary figure

      The story is quite comprehensive and contains a lot of detail regarding the interaction of signaling pathways, internal state, and sensory stimuli. I believe a schematic summary figure bringing together all findings could be very helpful and would make it much easier to understand the discussion!

      Figure 7 has been prepared, which provides a summary of the findings and an explanation of the current working model.

      (2) Figure S10/effect on SAG activation of EHP

      At the moment, the quite interesting and relevant result that SAG activation shortens EHP shown in Figure S10 is only referred to in the discussion. Maybe move this to the results and give it a bit more attention? Actually, I believe this is a very exciting finding that could also be the basis for some more interesting speculations about physiological relevance. Since SAG is silenced upon seminal fluid/sex peptide exposure after mating, a mating with failed SAG silencing (i.e. unusually high post-mating SAG activity) could indicate to the female that there was low or failed sex peptide/seminal fluid transfer. In such a case it would be probably advantageous for the female to decrease EHP and quickly remate, as females need the "beneficial" effects of seminal fluid on ovulation and physiology adaptation. SAG could therefore represent another arm of sensing male quality- here not via external pheromones, but internally, via sensing male sex peptide levels.

      If this is a bit preliminary and rather suited to start a new study, Figure S10 could also be removed from the current manuscript.

      Figure S10 and associated text were removed in the revised version of the manuscript.

      (3) PhotoAC experiments in pC1b,c: the authors find that raising cAMP levels in pC1b,c leads to a decrease in EHP. They argue that increased cAMP levels lead to higher excitability of pC1b,c. This implies that the activity of pC1b,c promotes mating plug ejection. I assume the authors have also tried activating pC1b,c directly by optogenetic cation channels? What is the outcome of this? If different from elevating cAMP levels: why so?

      We employed CsChrimson, a red light-sensitive channelrhodopsin, to investigate the effect of optogenetic activation of each pC1 subset on EHP. Optogenetic activation of pC1a, pC1d, or pC1e had little effect on EHP; however, optogenetic activation of pC1b, c significantly increased EHP. This observation was puzzling because optogenetic silencing of the same neurons also increased EHP. In this experiment, females expressing CsChrimson were exposed to red light for the entire period of EHP measurement. Therefore, we suspect that prolonged activation of pC1b and pC1c neurons depleted their neurotransmitter pool, resulting in a silencing effect, but this requires further testing.

      Author response image 1.

      The prolonged optogenetic activation of pC1b, c neurons increases EHP, mimicking silencing of pC1b, c neurons. Females of the indicated genotypes were cultured on food with or without all-trans-retinal (ATR). The ΔEHP is calculated by subtracting the mean of the reference EHP of females cultured in control ATR- food from the EHP of individual females in comparison. The female genotypes are as follows: (A) 71G01-GAL4/UAS-CsChrimson, (B) pC1a-split-Gal4/UAS-CsChrimson, (C) pC1b,c-split-Gal4/UAS-CsChrimson, (D) pC1d-split-Gal4/UAS-CsChrimson, and (E) pC1e-split-Gal4/UAS-CsChrimson. Gray circles indicate the ΔEHP of individual females, and the mean ± SEM of data is presented. Mann-Whitney Test (n.s. p > 0.05; *p <0.05; ****p < 0.0001). Numbers below the horizontal bar represent the mean of the EHP differences between the indicated treatments.

      (4) Text edits

      In general, the manuscript is very well-written, clear, and easy to follow. I recommend small edits of the text and correction of typos in some places:

      l.92: "Drosophila females seem to signal the social sexual context through sperm ejection." This sentence could give the impression that the main function of sperm ejection was to signal to conspecifics. I recommend reformulating to leave it open if ejected sperm is a signal or rather a simple cue. e.g. :"There is evidence that Drosophila females detect the social sexual context through sperm ejected by other females."

      Thanks for the good suggestion. It has been revised as suggested. In addition, we have also made additional changes to the text to correct typos.

      l.97: "transcriptional factor" > "transcription factor"

      Revised as suggested. See lines 77, 98, and 201.

      l.101: "There are Dsx positive 14 pC1 neurons in each brain hemisphere of the brain," > "There are 14 Dsx positive pC1 neurons in each brain hemisphere,"

      Revised as suggested, it now reads " There are 14 Dsx-positive pC1 neurons in each hemisphere of the brain, ...".

      l.160: ", even up to 1440 ng" > ", even when applied at concentrations as high as 1440 ng"

      Revised as suggested.

      l.168: "females with male oenocytes significantly shortens EHP" >"females with male oenocytes significantly shorten EHP"

      Revised as suggested.

      l.181: "it was restored when Orco expression is reinstated" >"it was restored when Orco expression was reinstated"

      Revised as suggested. See line 186.

      l.196: "MIES is almost completely abolished" >"MIES was almost completely abolished"

      Revised as suggested. See line 201.

      l.202: "a sexually dimorphic transcriptional factor gene" >"the sexually determination transcription factor gene" or "the sex specifically spliced transcription factor gene". The gene itself is not dimorphic!

      Revised as suggested, lines 208-210 now read "The same study found that Dh44 receptor neurons involved in EHP regulation also express doublesex (dsx), which encodes sexually dimorphic transcription factors."

      l.211: "to silenced" > "to silence"

      Revised as suggested. See line 216.

      l.229: "females that selectively produce the CRE-Luciferase reporter gene" >"females that selectively express CRE-Luciferase reporter"

      Revised as suggested. See line 234.

      l.271: "neurons. expedite" > delete dot

      Revised as suggested. See line 284.

      l.287: "Furthermore, our study has uncovered the conserved neural circuitry that processes male courtship cues and governs mating decisions play an important role in regulating this behavior." > grammar: "our study has uncovered that the conserved neural circuitry that processes male courtship cues and governs mating decisions plays an important role in regulating this behavior." Also: the meaning of "conserved" is not fully clear to me here: conserved in regards to other Drosophila species? Or do the authors mean: general functional similarity with mouse sexual circuitry?

      The sentence (lines 299-301) has been revised for clarity to read "In addition, our study has revealed that the neural circuit that processes male courtship cues and controls mating decisions plays an important role in regulating this behavior. This fly circuit has recently been proposed to be homologous to VMHvl in the mouse brain (45, 46).”

      l.311: "lipid drolet" > "lipid droplets"

      Revised as suggested. See line 325.

      l.316 and in several instances in the following, including Figure 5 caption (l.723) : "cAMP activity" > "cAMP levels" or "increased cAMP levels"

      Revised as suggested.

      l.323: "in hemibrain" > ", as seen in the hemibrain connectome dataset"

      Revised as suggested. See line 337.

      l.326: "increased cAMP levels causes pC1b,c neurons" > "increased cAMP levels cause pC1b,c neurons"

      Revised as suggested. See line 340.

      l.329: "removement" > "removal" or "ejection"

      Revised as suggested, it now reads "the removal of the mating plug". See line 343.

      l. 330: "This observation well aligns" > "The observation aligns well"

      Revised as suggested. See line 345.

      l. 398: Behavior assays: It would be good to describe how mating plug ejection was identified- by eye? Under the microscope/UV light?

      The following sentence has been added to the behavioral assays section at lines 425-426: The sperm ejection scene, in which the female expels a white sac containing sperm and the mating plug through the vulva, has been directly observed by eye in recorded video footage.

      l.685, Figure legend 2: "thermal activation" > "thermogenetic activation"

      Revised as suggested. See line 430.

      Reference:

      (1) Doubovetzky, N., Kohlmeier, P., Bal, S., & Billeter, J. C. (2023). Cryptic female choice in response to male pheromones in Drosophila melanogaster. bioRxiv, 2023-12.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Lee et al. compared encoding of odor identity and value by calcium signaling from neurons in the ventral pallidum (VP) in comparison to D1 and D2 neurons in the olfactory tubercle (OT).

      Strengths:

      They utilize a strong comparative approach, which allows the comparison of signals in two directly connected regions. First, they demonstrate that both D1 and D2 OT neurons project strongly to the VP, but not the VTA or other examined regions, in contrast to accumbal D1 neurons which project strongly to the VTA as well as the VP. They examine single unit calcium activity in a robust olfactory cue conditioning paradigm that allows them to differentiate encoding of olfactory identity versus value, by incorporating two different sucrose, neutral and air puff cues with different chemical characteristics. They then use multiple analytical approaches to demonstrate strong, low-dimensional encoding of cue value in the VP, and more robust, high-dimensional encoding of odor identity by both D1 and D2 OT neurons, though D1 OT neurons are still somewhat modulated by reward contingency/value. Finally, they utilize a modified conditioning paradigm that dissociates reward probability and lick vigor to demonstrate that VP encoding of cue value is not dependent on encoding of lick vigor during sucrose cues, and that separable populations of VP neuros encode cue value/sucrose probability and lick vigor. Direct comparisons of single unit responses between the two regions now utilize linear mixed effects models with random effects for subject,

      Weaknesses:

      The manuscript still includes mention of differences in effect size or differing "levels" of significance between VP and OT D1 neurons without reports of a direct comparisons between the two populations. This is somewhat mitigated by the comprehensive statistical reporting in the supplemental information, but interpretation of some of these results is clouded by the inclusion of OT D2 neurons in these analyses, and the limited description or contextualization in the main text.

      We think the reviewer is mistaken and have clarified the text.  Each pairwise comparison between VP, OTD1 and OTD2, for each odor across days is shown as a heatmap in supplementary figure 8B, with further details in table 37. Absolute diff 3H no statistics

      Reviewer #2 (Public Review):

      We appreciate the authors revision of this manuscript and toning down some of the statements regarding "contradictory" results. We still have some concerns about the major claims of this paper which lead us to suggest this paper undergo more revision as follows since, in its present form, we fear this paper is misleading for the field in two areas. here is a brief outline:

      (1) Despite acknowledging that the injections only occurred in the anteromedial aspect of the tubercle, the authors still assert broad conclusions regarding where the tubercle projects and what the tubercle does. for instance, even the abstract states "both D1 and D2 neurons of the OT project primarily to the VP and minimally elsewhere" without mention that this is the "anteromedial OT". Every conclusion needs to specify this is stemming from evidence in just the anteromedial tubercle, as the authors do in some parts of the the discussion.

      We have clarified in multiple locations that we are recorded from the anteromedial OT, including the abstract, and further clarified this in the conclusions throughout the results and discussion. We refrain stating “anteromedial OT” at every mention of the OT, but think we have now made it abundantly clear that our observations are from the anteromedial OT. It is worth noting that retrograde tracing from the VTA did not label any neuron in any part of the OT, suggesting that the conclusion may well extend beyond the anteromedial portion. Though, we acknowledge further work is needed to comprehensively characterize the OT outputs.

      (2) The authors now frame the 2P imaging data that D1 neuron activity reflects "increased contrast of identity or an intermediate and multiplexed encoding of valence and identity". I struggle to understand what the authors are actually concluding here. Later in discussion, the authors state that they saw that OT D1 and D2 neurons "encode odor valence" (line 510). 

      The point we aim to make is that valence encoding is different between the OT and VP. We do not think the reward modulated activity in OT is valence encoding, at least not as it is in the VP.  We do observe some valence encoding at the population level, which is different from individual valence encoding neurons. The ability of classifiers to segregate population activity based on reward might be considered valence encoding, but we contrast it with that in VP where individual neurons signal reward prediction. This is more robust than that in the OT data where few neurons robustly encode valence. The increased response of the OTD1 neurons after reward association, is more consistent with contrast enhancement than valence encoding.  We believe this distinction is important and reflects a transformation between two reward-related brain areas. For clarification of the sentence in question we have changed it to reflects “increased contrast of iden-ty or an intermediate encoding of valence that also encodes iden-ty.” (line 488)

      We appreciate the authors note that there is "poor standardization" when it comes to defining valence (line 521). We are ok with the authors speculating and think this revision is more forthcoming regarding the results and better caveats the conclusions. I suggest in abstract the authors adjust line 14/15 to conclude that, "While D1 OT neurons showed larger responses to rewarded odors, in line with prior work, we propose this might be interpreted as identity encoding with enhanced contrast." [eliminating "rather than valence encoding" since that is a speculation best reserved for discussion as the authors nicely do.

      We accept this suggestion and have modified the abstract sentence to say, “Though D1 OT neurons showed larger responses to rewarded odors than other odors, consistent with prior findings, we interpret this as iden-ty encoding with enhanced contrast.”  We believe this is appropriately qualified as an interpreta-on, and should not be confusing.

      The above items stated, one issue comes to mind, and that is, why of all reasons would the authors find that the anteromedial aspect of the tubercle is not greatly reflecting valence. the anteromedial aspect of the tubercle, over all other aspects of the tubercle, is thought my many to more greatly partake in valence and other hedonic-driven behaviors given its dense reception of VTA DAergic fibers (as shown by Ikemoto, Kelsch, Zhang, and others). So this finding is paradoxical in contrast to if the authors would had studied the anterolateral tubercle or posterior lateral tubercle which gets less DA input.

      We agree that this seems surprising.  This is why we focused on the anteromedial expecting to find valence encoding.  It remains possible that other parts of the OT, or more dorsal aspects of the anteromedial OT encode valence, as has been reported by Murthy and colleagues.  However, it remains unclear if their recordings are in the OT or VP.  Nonetheless our findings indicate that more work is required to understand the contribution of the OT to valence encoding.  It is also important to note that our conclusions are drawn in comparison to the VP, which has more robust valence encoding than the OT. Thus, in comparison the OT sample in our recordings lack robust valence signaling.  We think this comparison is important, due to the lack of clear framework for defining valence that may create misleading statements in past OT work.  

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes a study of the olfactory tubercle in the context of reward representation in the brain. The authors do so by studying the responses of OT neurons to odors with various reward contingencies and compare systematically to the ventral pallidum. Through careful tracing, they present convincing anatomical evidence that the projection from the olfactory tubercle is restricted to the lateral portion of the ventral pallidum.

      Using a clever behavioral paradigm, the authors then investigate how D1 receptor- vs. D2 receptor-expressing neurons of the OT respond to odors as mice learn different contingencies. The authors find that, while the D1-expressing OT neurons are modulated marginally more by the rewarded odor than the D2-expressing OT neurons as mice learn the contingencies, this modulation is significantly less than is observed for the ventral pallidum. In addition, neither of the OT neuron classes shows conspicuous amount of modulation by the reward itself. In contrast, the OT neurons contained information that could distinguish odor identities. These observations have led the authors to conclude that the primary feature represented in the OT may not be reward.

      Strengths:

      The highly localized projection pattern from olfactory tubercle to ventral pallidum is a valuable finding and suggests that studying this connection may give unique insights into the transformation of odor by reward association.

      Comparison of olfactory tubervle vs. ventral pallidum is a good strategy to further clarify the olfactory tubercle's position in value representation in the brain.

      Weaknesses:

      The study comes to a different conclusion about the olfactory tubercle regarding reward representations from several other prior works. Whether this stems from a difference in the experimental configurations such as behavioral paradigms used or indeed points to a conceptually different role for the olfactory tubercle remains to be seen.

      We acknowledge that our results lead us to conclusions that are different from that of prior work.  But we note that our results are not directly at odds, as we see similar reward modulation of D1 OT neurons as has been reported previously. Our conclusion is different because we contrast our OT responses with that in the VP where valence is more robustly encoded at the single neuron level. We also note, that many of the past studies do not define valence as stringently as we do.  Thus, increased activity with reward, as observed in our data and past studies, seems more like reward modulation than valence.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work explored intra and interspecific niche partitioning along spatial, temporal, and dietary niche partitioning between apex carnivores and mesocarnivores in the Qilian Mountain National Park of China, using camera trapping data and DNA metabarcoding sequencing data. They conclude that spatial niche partitioning plays a key role in facilitating the coexistence of apex carnivore species, spatial and temporal niche partitioning facilitate the coexistence of mesocarnivore species, and spatial and dietary niche partitioning facilitate the coexistence between apex and mesocarnivore species. The information presented in this study is important for wildlife conservation and will contribute substantially to the current understanding of carnivore guilds and effective conservation management in fragile alpine ecosystems.

      Strengths:

      Extensive fieldwork is evident in the study. Aiming to cover a large percentage of the Qilian Mountain National Park, the study area was subdivided into squares, as a geographical reference to distribute the sampling points where the camera traps were placed and the excreta samples were collected.

      They were able to obtain many records in their camera traps and collected many samples of excreta. This diversity of data allowed them to conduct robust analyses. The data analyses carried out were adequate to obtain clear and meaningful results that enabled them to answer the research questions posed. The conclusions of this paper are mostly well supported by data.

      The study has demonstrated the coexistence of carnivore species in the landscapes of the Qilian Mountains National Park, complementing the findings of previous studies. The information presented in this study is important for wildlife conservation and will contribute substantially to the current understanding of carnivore guilds and effective conservation management in fragile alpine ecosystems.

      Weaknesses:

      It is necessary to better explain the methodology because it is not clear what is the total sampling effort. In methodology, they only claim to have used 280 camera traps, and in the results, they mention that there are 319 sampling sites. However, the total sampling effort (e.g. total time of active camera traps) carried out in the study and at each site is not specified.

      Thanks a lot for this detailed review! We apologize for not offering a distinct description of the overall sampling effort. In this study, we deployed 280 camera trappings, and these cameras were active for approximately 4 to 6 months. We visited each camera 2 to 3 times annually to download photos and check the batteries. In case some cameras failed to capture the targeted carnivore, we would relocate the positions of those cameras. Eventually, we collected 322 camera trapping sites, among which 3 cameras malfunctioned due to loss. As a result, we analyzed data from 319 camera sites and obtained 14,316 independent detections over 37,192 trap-days.

      We have added this information as follows in lines 132 to lines 143: “Taking into account the fact that mammalian communities are sensitive to seasonality, we used camera traps to monitor animals with an extensive survey effort from December 2016 to February 2022, covering the activity of animal species in different seasons, which can reflect the overall distribution of carnivores. We placed a total of 280 infrared cameras at the study site, set them to be active for 4 to 6 months, and considered possible relocation to another position based on animal detection in an effort to improve estimates of the occupancy and detection rates for both common and rare species (Figure 1) (Kays et al., 2020). The camera trap was set to record the time and date on a 24 hr clock when triggered, and to record a 15s video and 1 photo with an interval of 2 minutes between any two consecutive triggers. The sum of camera trap effective days was defined by the total amount of trapping effort during the sampling period, which was calculated from the time the camera was placed in operation to the time the last video or photograph was taken. We visited each camera 2 to 3 times a year to download photos and check batteries.” and lines 228 to lines 232: “A total of 322 camera trap sites were surveyed after relocating infrared cameras that did not capture any target carnivore species. A total of 3 cameras were considered to have failed due to loss. We analyzed data from 319 camera sites and obtained 14,316 independent detections during a total effort of 37,192 effective camera trap days. We recorded wolf in 26 sites, snow leopard in 109 sites, Eurasian lynx in 36 sites, red fox in 92 sites, and Tibetan fox in 34 sites.”

      Reviewer #2 (Public Review):

      Summary:

      The study entitled "Different coexistence patterns between apex carnivores and mesocarnivores based on temporal, spatial, and dietary niche partitioning analysis in Qilian Mountain National Park, China" by Cong et al. addresses the compelling topic of carnivores' coexistence in a biodiversity hotspot in China. The study is interesting given it considers all three components affecting sympatric carnivores' distribution and co-occurrence, namely the temporal, the spatial, and the dietary partition within the carnivore guild. The authors have found that spatial co-occurrence is generally low, which represents the major strategy for coexistence, while there is temporal and dietary overlap. I also appreciated the huge sampling effort carried out for this study by the authors: they were able to deploy 280 camera trapping sites (which became 322 in the result section?) and collect a total of 480 scat samples. However, I have some concerns about the study on the non-consideration of the human dimension and potential anthropogenic disturbance that could affect the spatial and temporal distribution of carnivores, the choice of the statistical model to test co-occurrence, and the lack of clearly stated ecological hypotheses.

      Strengths:

      The strengths of the study are the investigation of all three major strategies that can mitigate carnivores' coexistence, therefore, the use of multiple monitoring techniques (both camera trapping and DNA metabarcoding) and the big dataset produced that consists of a very large sampled area with a noteworthy number of camera trap stations and many scat samples for each species.

      Weaknesses:

      I think that some parts of the manuscript should be written better and more clearly. A clear statement of the ecological hypotheses that could affect the partitioning among the carnivore guild is lacking. I think that the human component (thus anthropogenic disturbance) should have been considered more in the spatial analyses given it can influence the use of the environment by some carnivores. Additionally, a multi-species co-occurrence model would have been a more robust approach to test for spatial co-occurrence given it also considers imperfect detection.

      Thank you very much for your valuable comments and suggestions. We checked and edited the manuscript, and we thought the English level was improved.

      (1) According to your suggestion, we added the competitive exclusion and niche differentiation hypothesis with space, time and diets axis to explain co-occurrence relationship among species in the introduction as follow: “The competitive exclusion principle dictates that species with similar ecological requirements are unable to successfully coexist (Hardin, 1960; Gause, 1934). Thus, carnivores within a guild occupy different ecological niches based on a combination of three niche dimensions, i.e. spatial, temporal, and trophic (Schoener, 1974). Spatially, carnivore species within the same geographic area exhibit distinct distributions that minimize overlap in resource use and competition. For example, carnivores can partition habitats based on habitat feature preferences and availability of prey (De Satgé et al., 2017; Garrote and Pérez De Ayala, 2019; Gołdyn et al., 2003; Strampelli et al., 2023). Temporally, differences in seasonal or daily activity patterns among sympatric carnivores can reduce competitive interactions and facilitate coexistence. For example, carnivores can exhibit temporal segregation in their foraging behaviors, such as diurnal versus nocturnal activity, to avoid direct competition (Finnegan et al., 2021; Nasanbat et al., 2021; Searle et al., 2021). Trophically, carnivore species can diversify their diets to exploit different prey species or sizes, thereby reducing competition for food resources. For example, carnivores can exhibit dietary specialization to optimize their foraging efficiency and minimize competitive pressures (Steinmetz et al., 2021).”

      (2) In addition to distance from roads, we included human dimension as covariates influencing occupancy rates based on the number of independent photos or videos of herders and livestock detected by infrared cameras (named human disturbance and is represented by hdis). According to the results of occupancy models, we found red fox occupancy probability displayed a significant positive relationship with hdis. Moreover, the detection probability of snow leopard and Eurasian lynx decreased with increasing hdis.

      We have incorporated these results into the Results as follow: “According to the findings derived from single-season, single-species occupancy models, the snow leopard demonstrated a notably higher probability of occupancy compared to other carnivore species, estimated at 0.437 (Table 1). Conversely, the Eurasian lynx exhibited a lower occupancy probability, estimated at 0.161. Further analysis revealed that the occupancy probabilities of the wolf and Eurasian lynx declined with increasing Normalized Difference Vegetation Index (NDVI) (Table 2, Figure 2). Additionally, wolf occupancy probability displayed a negative relationship with roughness index and a positive relationship with prey availability. Snow leopard occupancy probabilities exhibited a negative relationship with distance to roads and NDVI. In contrast, both red fox and Tibetan fox demonstrated a positive relationship with distance to roads. Moreover, red fox occupancy probability increased with higher human disturbance and greater prey availability. The detection probabilities of wolf, snow leopard, red fox, and Tibetan fox exhibited an increase with elevation (Table 2). Moreover, there was a positive relationship between the detection probability of Tibetan fox and prey availability. The detection probabilities of snow leopard and Eurasian lynx declined as human disturbance increased.”

      (3) We appreciate the suggestion to use a multi-species co-occurrence model to test spatial co-occurrence. We attempted a multispecies occupancy modeling to analysis the five species in our study followed the method of Rota et al. (2016). Initially, we simplified the candidate models by adopting a single-season, single-species occupancy model. We selected occupancy covariates from the best model as the best covariates for each species and used them to establish multispecies occupancy models. Unfortunately, the final model results did not converge. We are investigating potential solutions to resolve this problem.

      Rota CT, Ferreira MAR, Kays RW, Forrester TD, Kalies EL, McShea WJ, Parsons AW, Millspaugh JJ. 2016. A multispecies occupancy model for two or more interacting species. Methods Ecol Evol 7:1164–1173. doi:10.1111/2041-210X.12587

      Temporal and dietary results are solid and this latter in particular highlights a big predation pressure on some prey species such as the pika. This implies important conservation and management implications for this species, and therefore for the trophic chain, given that i) the pika population should be conserved and ii) a potential poisoning campaign against small mammals could be incredibly dangerous also for mesocarnivores feeding on them due to secondary poisoning.

      Thank you for your thoughtful comments. We appreciate your recognition of the temporal and dietary findings, particularly the highlighted predation pressure on prey species like the pika. These observations indeed underscore critical implications for conservation and management. The necessity to conserve the pika population is paramount for its role in maintaining the stability of the trophic chain within its ecosystem. As you rightly pointed out, any disruption to this delicate balance, including through predation or indirect threats like poisoning campaigns, could have far-reaching consequences. Regarding the potential risks associated with poisoning campaigns targeting small mammals, we acknowledge the significant concerns raised about secondary poisoning affecting mesocarnivores. This underscores the need for careful consideration in pest control strategies and the adoption of measures that minimize unintended ecological impacts. Our findings suggest several practical implications for conservation and management. Conservation efforts should focus on vulnerable prey populations such as the pika, while management strategies could include regulatory frameworks and community education to mitigate risks associated with pest control methods. We believe our study contributes valuable insights into the complexities of predator-prey dynamics and the broader implications for ecosystem health. By integrating these findings into conservation practices, we can work towards ensuring the sustainability of natural systems and the species that depend on them.

      Reviewer #1 (Recommendations For The Authors):

      To better explain the methodology and the sampling effort I recommend reviewing e.g. Kays et al. 2020. An empirical evaluation of camera trap study design: How many, how long, and when?. Methods in Ecology and Evolution, 11(6), 700-713. https://besjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/2041-210X.13370.

      Thank you for this valuable suggestion! According to this reference, we have added this information to explain the methodology and the sampling effort as follow: “Taking into account the fact that mammalian communities are sensitive to seasonality, we used camera traps to monitor animals with an extensive survey effort from December 2016 to February 2022, covering the activity of animal species in different seasons, which can reflect the overall distribution of carnivores. We placed a total of 280 infrared cameras at the study site, set them to be active for 4 to 6 months, and considered possible relocation to another position based on animal detection in an effort to improve estimates of the occupancy and detection rates for both common and rare species (Figure 1) (Kays et al., 2020). The camera trap was set to record the time and date on a 24 hr clock when triggered, and to record a 15s video and 1 photo with an interval of 2 minutes between any two consecutive triggers. The sum of camera trap effective days was defined by the total amount of trapping effort during the sampling period, which was calculated from the time the camera was placed in operation to the time the last video or photograph was taken. We visited each camera 2 to 3 times a year to download photos and check batteries.”

      Reviewer #2 (Recommendations For The Authors):

      I have some concerns about the manuscript.

      I find that the manuscript should be written more clearly: some sentences are not straightforward to understand given the presence of structural errors that make the text hard to read; the paragraphs should be written in a more harmonic way (without logical leaps) with a smoother change of topic between paragraphs, especially in the introduction.

      We appreciate your constructive comments, which have helped us improve the clarity and coherence of the manuscript. We have revised the introduction to provide a clearer outline of the paper's structure and objectives. Specifically, we have rephrased complex sentences and removed ambiguities to ensure that each idea is communicated more straightforwardly. We providing clearer links between ideas and avoiding abrupt shifts in topics to ensure that a smoother transition between paragraphs.

      I feel like the strength of merging the two techniques (camera trapping and DNA metabarcoding) is not brought up enough, while the disadvantage of this approach is not even mentioned (e.g., the increasing costs).

      Thanks a lot for this valuable comment! We have added this information to the Discussion (L356-L363) as follow: “Our study highlights the effectiveness of combining camera trapping with DNA metabarcoding for detecting and identifying both cryptic and rare species within a sympatric carnivore guild. This integrated approach allowed us to capture a more comprehensive view of species presence and interactions compared to traditional visual surveys. whereas, it is important to acknowledge the challenges associated with this technique, including the high costs of equipment and the need for specialized training and computational resources to manage and analyze the large volumes of sequence data. Despite these challenges, the benefits of this combined method in improving biodiversity assessments and understanding species coexistence outweigh the drawbacks.”

      The structure of the manuscript does not follow the structure of the journal (Intro, Material and Method, Results, Discussion instead it reports the methods at the end of the main manuscript), and, most critically, I found that a clear explanation of the research hypothesis is missing: authors should clearly state they ecological hypotheses. What are your hypotheses on the co-occurrence relationship among species? What would specifically affect and change the sympatric relationships among carnivores?

      Thank you for this valuable suggestion! We have revised the manuscript, that is integrated the methods section appropriately within the main body of the manuscript to ensure that it aligns with the standard sections (Introduction, Materials and Methods, Results, Discussion.

      We state our main ecological hypotheses concerning the co-occurrence relationships among carnivore species is based on niche differentiation hypothesis. We hypothesize that differentiation along one or more niche axes is beneficial for the coexistence of carnivorous guild in the Qilian Mountains. We expected that spatial niche differentiation promotes the coexistence of large carnivores in the Qilian Mountain region, as they are more likely than small carnivores to spatially avoid interspecific competition (Davis et al., 2018). Mesocarnivores may coexist either spatially or temporally due to increased interspecific competition for similar prey (Di Bitetti et al., 2010; Donadio and Buskirk, 2006). Nutritional niche differentiation may be a significant factor for promoting coexistence between large and mesocarnivore species due to differences in body size (Gómez-Ortiz et al., 2015; Lanszki et al., 2019). We have added ecological hypotheses in lines 101 to 110.

      Another concern is that all pictures with people have been removed from the dataset, but I think that this could be a bit biased as human presence (or also the presence of livestock) could affect the spatial or temporal presence of carnivores, changing their co-occurrence dynamics. On one side, humans can be perceived as a source of disturbance by carnivores and, therefore, can cause a shift in distribution towards locations with lower human presence (or lower anthropogenic disturbance) that could further concentrate the presence of carnivores increasing the competitive interaction. Conversely, mesocarnivores could take advantage of an increasing human presence - following the human shield hypotheses - finding a refugium from larger body carnivores. From this perspective, important information on the potential anthropogenic pressure is lacking in the description of the study area: how effective is the protection effort of the park? How intense is the potential human disturbance in and around the park? Is there poaching? Intensive livestock grazing? Resources extractions? These are all factors that could affect the interactions among carnivores. Do not forget the possibility and risk of being retaliatory killed by humans due to the presence of livestock in the area. I think that incorporating the human dimension is important because it could strongly affect how carnivores perceive and use the environment. Here only the distance to the closest road has been considered. However, for example, recent research (Gorczynski et al 2022, Global Change Biology) has indeed found that co-occurrece of ecologically similar species differed in relation to increasing human density. Therefore, I think that anthropogenic disturbance is an aspect to be reckoned with and more variables as proxy of human disturbance should be considered.

      Thanks a lot for this valuable comment! We acknowledge that humans can act as both a disturbance factor, potentially driving carnivores away from highly populated areas, and as a source of indirect refuge for mesocarnivores, thereby affecting competitive interactions among carnivores. We understand that poaching and resource extraction are prohibited and livestock grazing is a significant human activity within the study area. Therefore, we added human dimension as covariates influencing occupancy rates based on the number of independent photos or videos of herders and livestock detected by infrared cameras (named human disturbance and is represented by hdis). According to the results of occupancy models, we found red fox occupancy probability displayed a significant positive relationship with hdis. Moreover, the detection probability of snow leopard and Eurasian lynx decreased with increasing hdis.

      In the statistical analyses section, I don't find that the statistical procedure is well described: it is not clear which occupancy model has been used (probably a single-species single-season occupancy model for each target species?), which covariates have been tested for each species and following which hypotheses. Additionally, I think that when modelling the spatial distribution of subordinate species, it should be important to include information on the spatial distribution of apex species given this could affect their occurrence on the territory. This could have been done by using the Relative Abundance Index of the apex predators as a covariate when modelling the distribution of subordinate species. Additionally, why haven't the authors used prey as a covariate for occupancy? I think that prey distribution should affect the occupancy probability more than the detection rate. Also, the authors used the Sørensen similarity index to measure associations between species. However, this association metric has been criticized (see the recent paper of Mainali et al 2022, Science Advances). I am therefore wondering: given the authors are using the occupancy framework, why don't they use a multi-species co-occurrence model that allows them to directly estimate both single-species occupancy and the co-occurrence parameter as a function of covariates (examples are Rota et al. 2016, Methods Ecol. Evol. Or Tobler et al. 2019, Ecology)? For the temporal overlap, I think that adding Figure S2 (pairwise temporal overlap) in the main text would help deliver the results of the temporal analyses more straightforwardly.

      Thanks a lot for this valuable comment!

      (1) The current manuscript utilizes a single-species single-season occupancy model for each target species. Additionally, we have added prey and human disturbance as occupancy covariables. We have revised the statistical analyses section to explicitly state this model choice and clarify the covariates tested for each species from lines 153 to lines170. The details are as follows: “To investigate the spatial distribution of carnivores, as well as the influence of environmental factors on the site occupancy of species in the study area, we performed single-season, single-species occupancy models to estimate carnivores’ occupancy (ψ) and detection (Pr) probability (Li et al., 2022b; MacKenzie, 2018; Moreno-Sosa et al., 2022). To ensure capture independence, only photo or video records at intervals of 30 min were was included in the data analysis (Li et al., 2020). We created a matrix recording whether each carnivore species was detected (1) or not (0) across several 30-day intervals (that is 0-30, 31-60, 61-90, 91-120, 121-150, >150 days) for each camera location. Based on the previous studies of habitat use of carnivores (Greenspan and Giordano, 2021; Alexander et al., 2016; Gorczynski et al., 2022), we selected terrain, vegetation, biological factors and disturbance to construct the model. Terrain is a fundamental element of wildlife habitat and closely linked to other environmental factors (Chen et al., 2024). Terrain variables include elevation (ele) and roughness index (rix). Vegetation variables include normalized difference vegetation index (ndvi), and provide information on the level of habitat concealment. Biological variables include prey abundance (the number of independent photos of their preferred prey based on dietary analysis in this study, wolf and snow leopard: artiodactyla including livestock; Eurasian lynx and Pallas’s cat: lagomorpha; red fox and Tibetan fox: lagomorpha and rodentia) and reflect habitat preference and distribution patterns of carnivores. Disturbance variables include distance to roads (disrd) and human disturbances (hdis, the number of independent photos of herdsman and livestock) and can provide insight into the habitat selection and behavior patterns of carnivores.”

      (2) Thank you for your valuable suggestions. We acknowledge the importance of considering apex species in models of subordinate species' spatial distributions.

      Nonetheless, considering the consistency of covariates for each species and the lack of interspecies interactions in single-species occupancy models, we did not include the Relative Abundance Index of the apex predators as a covariate affecting the occupancy of mesopredators. As you recommended, multi-species occupancy models that account for interspecies interactions are a robust approach. However, we attempted to use the multi-species occupancy method of Rota et al. (Rota et al., 2016), the final model results did not converge. Specifically, we selected occupancy covariates from the best model by single-species model as the best covariates for each species and used them to establish multispecies occupancy models. We are investigating potential solutions to resolve this problem.

      (3) We used the Sørensen similarity index to measure associations between species based on support from previous literature. As counted by Mainali et al., the Sørensen index has been used in more than 700 papers across journals such as Science, Nature, and PNAS. We believe this index holds broad applicability in describing relationships between species.

      (4) We agree that presenting pairwise temporal overlap in the main text would enhance clarity. We revised the manuscript to include Figure S2 in the main text and ensure that the temporal analyses are more straightforwardly presented.

      Regarding the sampling collection of the scats, I'm just curious to know why you decided to use silica desiccant instead of keeping the samples frozen. I'm not familiar with this method and I guess it works fine because the environment is generally freezing cold. Yet, I would like to know more. How fresh do scat samples need to be in order to be suitable for DNA metabarcoding analyses? Additionally, what do you mean by "scats were collected within camera trapping area", could you be more specific? Have you specified a buffer around camera stations?

      Thanks a lot for this specific inquiry! We refer to the scat collection method mentioned in the study of Janecka et al (2008; 2011). Silica is used to dry the scats to minimize DNA degradation. Due to the limitation of field environmental conditions, there is no suitable equipment to freeze samples during sampling, the collected scat samples should be kept dry and cool in shade, and transferred to the laboratory as soon as possible after sampling. We selected relatively fresh samples based on the color of the scat as well as broken off bits and pieces from the outside part of the scat including pieces not directly in the sun. Collect scat material about the size of a pinkie nail in the tube. If over fill the tube it will likely not dry and lead to DNA degradation.

      The study area was subdivided into sample squares of 25 km2 (5×5 km) as a geographical reference for placing camera survey sites and collecting scat samples. Camera traps were set in areas believed to be important to and heavily used by wildlife, such as the bottoms of cliffs, sides of boulders, valleys and ridges along movement corridors. Also, we focused on sites with known or suspected carnivore activity to maximize probability of detection for scat samples. Therefore, transects were set around the infrared camera to collect scat samples. Length of each transect was determined by terrain, amount of scat, and available time. Each transect should have collected about 18 samples or covered 5 km of terrain to avoid uneven representation among transects and ensure that the team has sufficient time to return to base camp (Janečka et al., 2011).

      Janecka J, Jackson R, Yuquang Z, Li D, Munkhtsog B, Buckley-Beason V, Murphy W. 2008. Population monitoring of snow leopards using noninvasive collection of scat samples: A pilot study. Animal Conservation 11:401–411. doi:10.1111/j.1469-1795.2008.00195.x

      Janečka JE, Munkhtsog B, Jackson RM, Naranbaatar G, Mallon DP, Murphy WJ. 2011. Comparison of noninvasive genetic and camera-trapping techniques for surveying snow leopards. J Mammal 92:771–783. doi:10.1644/10-MAMM-A-036.1

      Kays R, Arbogast BS, Baker‐Whatton M, Beirne C, Boone HM, Bowler M, Burneo SF, Cove MV, Ding P, Espinosa S, Gonçalves ALS, Hansen CP, Jansen PA, Kolowski JM, Knowles TW, Lima MGM, Millspaugh J, McShea WJ, Pacifici K, Parsons AW, Pease BS, Rovero F, Santos F, Schuttler SG, Sheil D, Si X, Snider M, Spironello WR. 2020. An empirical evaluation of camera trap study design: How many, how long and when? Methods Ecol Evol 11:700–713. doi:10.1111/2041-210X.13370

      Regarding the discussion, the authors have information for 1) spatial distribution, 2) temporal overlap, 3) dietary requirement, they should use this information to support the discussion. Instead, sometimes it feels that authors go by exclusion or make a suggestion. For example: the authors have found dietary and temporal overlap between two apex predators (i.e., wolf and snow leopard), and they said that this suggests that spatial partitioning is responsible for their successful coexistence in this area (lines 195-196). But why "suggesting", what the co-occurrence metric says? Another example: "Apex carnivores and mesocarnivores showed substantial overlap in time overall, indicating that spatial and dietary partitioning may play a large role in facilitating their coexistence" (lines 241 - 242). However, this should not be a suggestion: your Sørensen similarity index is low proving spatial divergence. So, when data supports the hypotheses, the authors should be firmer in their discussion. Generally, when reading the discussion, it felt that a figure summarizing the partitioning would be much needed to digest which type of partitioning strategy the species are using.

      Thank you for your thoughtful comments and suggestions.

      (1) We appreciate your insights on the discussion section, particularly concerning the interpretation of our findings on spatial distribution, temporal and dietary overlap. We acknowledge the need for clearer interpretation of our findings. We have revised the discussion section to provide more direct support. For example, in line 294-295, we modify it as “We found dietary and temporal overlap among apex carnivores, showing that spatial partitioning is responsible for their successful coexistence in this area.” In line 341-342, we modify it as “Apex carnivores and mesocarnivores exhibited considerable overlap in time overall, showing that spatial and dietary partitioning may play a large role in facilitating their coexistence.”

      (2) We appreciate your suggestion regarding the inclusion of a figure summarizing partitioning strategies among species discussed. In our study, we organized the overlap index of space, time, and diet among carnivores in Table 3, which directly reflects the overlap of carnivore species in these three dimensions by summarizing them in a single table. Additionally, Figure 3 illustrates the activity patterns and overlap among species, while Figure 4 displays the primary prey of carnivores and the frequency of food utilization.

      About lines 228 - 229, just as a side note, the Pallas's cat, as the red fox, selects the environment according to a greater distribution of prey species, while also selecting primarily meadows and natural environment (Greco et al. 2022, Journal of wildlife management) additionally it is not strictly diurnal (Anile et al. 2020, Wildlife Research; Greco et al. 2022, Journal of wildlife management). Regarding the Pallas's cat and its exclusion from the temporal and spatial analyses, can you specify how many independent detection events you had?

      Thanks a lot for this valuable comment!

      (1) We appreciate the references to recent studies highlighting its habitat preferences and activity patterns. We have revised the manuscript to acknowledge these points and provide context regarding its habitat selection strategies. Specifically, we modify it as follow: “Pallas’s cat hunts during crepuscular and diurnal periods, inhabits meadow with greater prey abundance (Anile et al., 2021; Greco et al., 2022; Ross et al., 2019).”

      (2) The low detection rate of Pallas's cat (0.072) identified by single-species occupancy model raised concerns regarding the reliability of the results. The estimated high standard errors for each environmental variable and the wide confidence intervals around the detection rate further indicated potential bias or randomness. Consequently, we made the decision to exclude the Pallas's cat data from further analysis. Upon closer examination of the Pallas's cat data, it became evident that out of 319 camera sites surveyed, only 27 sites detected the presence of Pallas's cat. Notably, only 3 out of 193 sites in Gansu Province recorded detections, while Qinghai Province had 24 detections out of 126 sites. This skewed distribution of data likely contributed to the unsatisfactory outcomes observed in our models.

      About the diet and results of scat analyses, have you found any sign of intra-guild predation (i.e., apex predators that kill and sometimes consume subordinate carnivores to reduce competition), this could actually represent proof of competition and spatial overlap.

      Thanks a lot for your thoughtful comments!

      We observed intraguild predation in the diet of wolves and snow leopards. Specifically, we found the presence of Pallas’s cat, red fox, and Tibetan fox in the diet of wolfs, and Pallas’s cat, Eurasian Badger and Tibetan fox in the diet of snow leopard. However, these intraguild predation events accounted for only 1.89% of the diet composition of apex carnivores. We suggest that the rarity of these observations may be influenced by various factors and does not necessarily provide sufficient evidence of competition and spatial overlap. Therefore, further data collection and in-depth research are needed to better understand this phenomenon.

      Some minor comments: Figure 2 is really nice, while some abbreviations are missing in the caption of Table 2.

      Thank you for your feedback and positive comments on Figure 2. Unfortunately, we have removed Figure 2 from the manuscript. Due to the inclusion of prey abundance and human disturbance as occupancy covariates, these variables were derived solely from infrared camera trap data and did not encompass a comprehensive dataset across the entire national park. Therefore, we were unable to accurately spatially project for carnivore species occupancy probability in nature park.

      We apologize for the oversight that the abbreviations missing in the caption of Table 2. We have added the missing abbreviations to the caption of Table 2 as follow: “Abbreviations: Disrd-distance to roads, Ele-elevation, NDVI-normalized difference vegetation index, Rix- roughness index, hdis-human disturbance.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      In its current form, I would exclude the cryo-EM data from the manuscript. It does not add much and it is distracting from the excellent work that you did on the functional characterization of the variant. Alternatively, you could try to improve the resolution and see if you can get some more meaningful analysis out of the structures? I noticed that you only collected very small datasets. If you decide to pursue a higher resolution reconstruction, collecting more movies will give you a better chance to obtain a higher resolution.

      We express our gratitude to the reviewer for their invaluable feedback. While acknowledging that our structure currently maintains a low resolution, it still provides valuable insights into the splice's proximity to the N412 glycan density. This proximity and low-resolution map hindered the complete modeling of all the splice residues. Notably, this structure represents the first depiction of this particular splice variant. Consequently, it lays a foundation for subsequent studies in the field, and hence, we would want to keep it in the manuscript. As per reviewers’ suggestions, we have now included comparisons of our structure with the GluK1-2a receptor structure reported recently (Mayerson et al. 2022). We do plan to carry out higher-resolution structures in the future.

      I would probably also exclude the RNAseq analysis. I think that Figure 1 is fine, but the supplement 1 is not very successful in convincing me that the exon 9 is expressed mainly in early stages of brain development. In addition, the plot in Figure 1 indicates strong expression in the cerebellar cortex in 20s and 30s. If you decide to keep the data, I strongly encourage you to include more details on the analysis in the methods section.

      Thanks for this insightful comment. We have now modified this section extensively for better clarity. Indeed, the expression of this variant seems to be dynamic in different brain regions. This has now been specified in the revised manuscript. Figure 1 shows the expression of GRIK1 exon 9 gene in different regions of the human brain and donor age. The supplementary figure 1 is a zoom-in on one such region, the Cerebral cortex, where we observe the maximum expression of GRIK1. In this region, we also observed higher expression of exon 9 in the early stages of development. The scales of Figure 1 (0-4 RPKM) and supplemental Figure 1(06RPKM) are different due to more expression of other exons in supplemental Figure 1 (example, we observe 4RPKM expression in the shade of red, for figure 1, whereas similar values of 4RPKM are orange-yellow in the supplemental figure1). Using Supplemental Figure 1, we wanted to show the expression of exon 9 with respect to other exons during developmental stages that prove that GluK1-1 is highly expressed in the initial stages of life. more details on the analysis in the methods section has been added now.

      Additionally, there are a few minor issues in the data presentation:

      (1) in Fig. 2C there seems to be a mismatch between the green dose response plot and the GluK12a trace shown. The plot reports an EC50 of 187.7 uM, whereas in the sample trace 0.25 mM agonist activates only to ~20%.

      We have verified the data and statistics, confirming their consistency with the values reported in the manuscript. For Figure 2C, we present representative traces from a single cell. However, the EC50 value was calculated using Hill's equation based on averaged data from 5 cells.

      (2) The axis label is misprinted in Figure 3C

      Thanks. Corrected.

      (3) In Fig 5 supplement 1, panel B - the 3 last labels above the western blot lanes are off so it is difficult to see which sample corresponds to which lane.

      Thanks. We have corrected the figure.

      Reviewer #2 (Recommendations For The Authors):

      Overall I congratulate the authors of this study nicely done. It represents a large body of work.

      We thank the reviewer for his/her time and positive comments.

      I have several minor corrections that authors could consider for the revision of the manuscript P7. The desensitization rate of GluK1-2a was "delayed"... replace by "increased".

      Corrected.

      P9. Last line 0.37; P.. Add the P value.

      P value has been added as suggested.

      P11 authors indicate that K368/375//379/382H376-E mutant exhibit significant difference in desensitization properties in presence of NEto1, but on the 1st line of p11, they provide a P value above 0.05

      We thank the reviewer for pointing out this discrepancy and have fixed the same. We have discussed two mutants that show slower desensitization when compared to GluK1-1a co-expressed with Neto1. The K to E mutant has significance, while the des value for the K368/375//379/382H376-E mutant shows the same pattern, though not significantly. We have now modified the text to explain this more clearly.

      P19 the calculation of mean weighted tau TDes is not clear and should be better explained.

      Thanks. We have added more details in the Methods sections. We analyzed the current decays in response to 1–2 ms or 1 s applications by employing an exponential function or the sum of two exponential functions. This analysis allowed us to derive a weighted mean τdes using the formula [(τ1 × amplitude1) + (τ2 × amplitude2)]/[amplitude1 + amplitude2]. The tau values represent the time constants obtained from the exponential fits, while the amplitudes correspond to the estimated contributions of each component to the total peak current amplitude.

      [(A1 * t1) + (A2 * t2)] / (A1 + A2)

      It represents the calculation of a weighted mean, where A1 and A2 are the amplitudes, and t1 and t2 are the corresponding time constants. The formula calculates the overall mean time constant by taking into account the contribution of each component to the total amplitude.

      P19 the rate of recovery was obtained by fitting the one-phase association "with" exponential function. With is missing.

      We have corrected this error.  Thanks.

      P21 which method has been used for site directed mutagenesis

      Overlapping PCR was carried out for mutagenesis using the primers listed in Figure 4-table supplement 1. A ligation-free cloning approach (Zhang et al., 2017) was used. It has now been elaborated in the methodology section under Site directed mutagenesis.

      P21 and 22. Provide complete reference of reagent including species of antibodies.

      Thanks. We have added all the details in the methods section now. 

      Anti-His: Rabbit mAb #12698 (Cell Signaling Technology)

      Anti-Neto1: Rabbit #SAB3500679 (Sigma Aldrich)

      Anti-GFP: Mouse mAb G1546 (Sigma Aldrich)

      Anti-actin: Mouse mAb A3853 (Sigma Aldrich)

      P22 How much anti His antibody was used with 40microliter of protein A?

      We have used 2µg/ 40uL of Protein A slurry. This has now been added to the methodology.

      P23 Authors seem to have used a virus to express protein but the protocol is not given. For example what is P2 virus?

      We have now modified the manuscript to include details of baculovirus generation as per the protocol described in Goehring et al. 2014. We followed the same protocol wherein the 2nd generation of virus (P2) generated in insect (SF9) cells was used for infecting suspensionadapted HEK293-T cells for large-scale GluK1-1aEM protein expression.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      (1) The effect of the splice insert on Gluk1 regulation by Neto proteins is not fully clear. For example, experiments in Fig. 3G indicate that the desensitization time for Gluk1-1a + Neto2 is ~32ms. This value is half compared with data obtained from whole-cell experiments shown in Fig. 3A (~70ms). What is the reason for this discrepancy? If variability is observed between experiments, I wonder how valid are the comparisons made in panel A between GluK11a+Neto2 vs GluK1-2a+Neto2 groups. In the case of recovery analysis, authors found significant differences comparing both groups in the presence of Neto (Fig. 3B) but recovery times are not identic for Gluk1-1a vs Gluk1-2a (without Neto). Thus, I wonder if the fold change related to the control group (without Neto) is different. 

      We appreciate your detailed feedback, which has allowed us to clarify and reinforce the validity of our experimental findings. Different recording configurations (e.g., outside-out patch (Fig. 3G) versus whole-cell recordings (Fig. 3A) have been used. Whole-cell recordings average responses over a larger membrane area and also have slower solution exchange times compared to outside-out patch recordings. This may have contributed to the variability in desensitization times. However, similar trends in our whole cell vs. outside-out patch recordings were observed. Further, all the data except those presented in Figs 3G and 3H are from whole-cell recordings. We have performed multiple independent experiments and utilized rigorous statistical analyses to validate our comparisons. We report mean values with standard deviations or confidence intervals to provide a more accurate representation of the data.

      Neto1 significantly speeds up the recovery from desensitization for both variants, with a more pronounced effect on GluK1-1a (GluK1-1a +Neto1: 0.68 s) compared to GluK1-2a (GluK1-2a +Neto1: 1.15 s). The recovery times are not identical for the two variants, likely due to the presence of splice insert in GluK1-1a. Neto2, on the other hand, slows recovery for both variants without significant differential effects. However, the recovery rate from the desensitized state is faster for GluK1-1 compared to GluK1-2a alone, although insignificant (without Neto). 

      In the case of the glutamate concentration-response curve (Fig. 3C), EC50 values for Neto1 and Neto2 are relatively the same, but this approach on its own does not provide insights about the role of the splice insert. Previous experiments with the Gluk1 reveal differences between EC50 in the presence of Neto1 or 2 (Fisher, 2015), suggesting that the insert could regulate glutamate binding affinity, but still, this point is not directly demonstrated in this work.

      Thanks for this insightful comment. Indeed, we cannot conclude that splice residues directly affect glutamate sensitivity and have modified the text accordingly. The Fisher paper demonstrated that both Neto1 and Neto2 can influence glutamate sensitivity in GluK1-2a, with EC50 values of 124.6 ± 16.2 µM. Specifically, in the presence of Neto1 and Neto2, the EC50 values are 4.4 ± 0.4 µM and 13.7 ± 4.2 µM, respectively, indicating a noticeable effect though not substantially different for GluK1-2a coexpressed with either Neto1 and Neto2. Our observation for the GluK1-1a has been similar, with both Neto1 and Neto2 showing a leftward shift.

      (2) Similar to the previous point, a proper interpretation of mutant data is missing in the manuscript. From current data, it is difficult to visualize the role of the insert on Netodependent regulation, mainly, because of the fact that some mutations alone affect Gluk1-1 channel properties. The authors conclude their data by stating that "while the modulation of the receptor by Neto 1 is affected by mutations in splice insert, the modulation by Neto 2 remains largely unaffected" (Page 13). However, this statement is confusing since the co-expression of Gluk1-1a with Neto2 (Fig. 5) prevents the effect caused by mutation K368 alone (Fig. 4), indicating that modulations by Neto 2 are indeed potentially affected by the mutations. Please, clarify. Also, the effect of the K368/375/379/382H376-E mutant on Neto modulation (pink bar in Fig. 5) is impossible to interpret properly since the effect of the mutation alone is not shown in the manuscript.

      Thanks for seeking this important clarification. It is indeed true that splice residue mutations themselves affect the receptor functional properties in comparison to the wild-type receptors. For the sake of clarity, we have presented the effect of splice mutants on receptor properties separately from the effect of mutations on modulation by Neto proteins. Figure 4 demonstrates a comparison between wild-type and mutant receptors without the Neto proteins, showcasing different kinetic properties, while Figure 5 provides detailed information on the role of the insert in Neto-dependent regulation. 

      It’s true we could not record the effect of the K368/375/379/382H376-E mutant alone or when coexpressed with Neto 2 due to low peak amplitudes (mentioned in Table 1) that prevented reliable comparisons. However, robust currents were observed when the same mutant was coexpressed with Neto1, and hence comparisons were shown for this mutant with GluK1-1a wild-type + Neto1. 

      We have now modified the statement "while the modulation of the receptor by Neto 1 is affected by mutations in splice insert, the modulation by Neto 2 remains largely unaffected" and the last paragraph as follows:

      “Neto1 appears to have more pronounced effects on the mutant receptors compared to Neto2. Specifically, Neto1 significantly slowed desensitization for the K368-E mutant, accelerated recovery from desensitization for K368-E and K368/375/379/382H376-E mutants, increased agonist efficacy for K368-E and K375/379/382H376-E mutants, and altered rectification properties for K368E and K368/375/379/382H376-E mutants. In contrast, Neto2 had fewer significant effects on the mutant receptors, with the main impact being an increase in agonist efficacy for the K368-E mutant. Notably, Neto2 did not significantly affect desensitization, recovery from desensitization, or rectification properties of the mutant receptors when compared with wildtype GluK1-1a coexpressed with Neto2. These findings suggest that the splice residues in GluK1-1a differentially influence receptor modulation by Neto1 and Neto2, with Neto1 showing more extensive modulation of the mutant receptors' functional properties.”

      (3) An open question after reading this interesting work is if the proposed change in Neto regulation because of the splice insert is due to changes in Gluk1-Neto interactions or because the rearrangement after interaction with Neto proteins is different. Pull-down experiments (Fig 5 Sup.1) suggest that the splice insert and all the mutants tested do not prevent interaction with Neto proteins. I wonder if the authors could complement their data with a quantitative approach/analysis to demonstrate if the splice insert and the mutants affect Neto1/2 interactions (as expected for the rationale when creating the mutants).

      Thank you for this insightful suggestion. You raise an important point about distinguishing between changes in GluK1-Neto interactions and potential differences in receptor rearrangement after Neto binding. While our pull-down experiments suggest that the splice insert and mutants don't prevent Neto interactions (probably due to a larger interaction interface all along the receptor), a quantitative approach would indeed provide more nuanced information. In future studies, we do plan to perform a quantitative approach like Surface plasmon resonance to assess the changes in interactions upon mutations in the splice and/or Neto proteins in different states of the receptor. In addition, obtaining cryo-EM structures of GluK1 splice variants in complex with Neto1 and Neto2 would provide crucial insights into their interaction interfaces and any conformational changes induced by binding. 

      (4) Related to the Gluk1-1a structure, the authors state that the overall structure is similar to the one without the insert (page 14); however, this is not properly shown in the manuscript. Even if the overall architecture of the channel is the same, authors should make a proper/adequate comparison between both structures/domains to support their claims. Also, one should expect that the insertion of 15 amino acids would affect in some way the closing neighboring domains. The differential effect of the splice insert on glutamate and kainate EC50 values (Fig. 2 and Fig. 2 sup.1), suggests that the insert could introduce a sort of rearrangement in the binding domain. Thus, I wonder if a more elaborated analysis of the current structural data could reveal some structural insights that would explain the specific functional differences due to the splice insert. If the low resolution and the missing residues avoid making some comparisons and establish differences between sidechain orientations, still, a proper comparison between the domain backbones would be helpful to validate the author's statement at least. Also, I wonder if the changes could be resolved better in a closed state or APO structure, instead of the desensitized structure. Finally, are the structures obtained in DDM and nanodiscs similar?

      As per the reviewer’s suggestion, we have now added a new figure in the supplementary information, “Figure 6-figure supplement 9,” where we show a superimposition of GluK11aEM (detergent-solubilized or reconstituted in nanodiscs) and GluK1-2a (PDB:7LVT; silver) showing overall conservation of the structures in the desensitized state.

      As evident from the figure and rmsd values mentioned above, we do not observe significant movements at both ATD and LBD layers of GluK1-1a with respect to GluK1-2a. Also as can be observed the DDM solubilized and nanodisc reconstituted GluK1-1a (Panel A) are very similar with a rmsd of ~2.19Å across all the 2664 Calpha atom pairs. Due to low resolution of our structures, we have refrained from carrying out detailed structural comparisions.

      Our efforts to capture the closed state or apo state structures have failed due to either severe orientation bias (only top views) or increased heterogeneity. 

      (5) Methods section lacks relevant information for proper data interpretation as well as for replicating some experiments in the future. For example:

      A) The experimental design to determine the rectification index with a Ramp protocol is not clear: 1) Why the authors applied a ramp protocol if receptors desensitize along the time? Please clarify the protocol.

      Ramp protocols were used only for the wild-type receptors to compare their voltage-dependent behavior, as this was the first study to compare the two splice variants. All kainate receptors (GluK1-GluK5) desensitize over time. However, their rectification properties have been studied previously (both the absence and presence of Neto proteins) using Ramp protocols as they are faster than step protocols.  

      B) Are polyamines included in the solutions to perform the rectification assays?

      No, polyamines were not added to the intracellular solution, and the effect of the endogenous polyamine block was measured. This has now been specified in the results as well as the methods section.

      C) It is not clear if the experiments to calculate IK/IG ratios were performed in the same preparation (This is, the same cell was stimulated with glutamate and then kainate or vice versa).

      Indeed, the current responses for glutamate vs kainate are performed in the same cell (the same cell was stimulated by glutamate then kainate) so that the responses can be compared. It’s now been specified in the methods section.

      D) The experimental design for calculating recovery is not clear.

      We employed a double pulse protocol to measure receptor recovery. The protocol involved applying two consecutive pulses of agonist stimulation to the receptor. Initially, we applied a brief agonist pulse to activate the receptor, followed by a specific recovery period. After the recovery period, we administered a second agonist pulse to assess the receptor's recovery response. The receptor's recovery was determined by comparing the response amplitude of the second pulse to that of the first pulse, providing valuable insights into the receptor's recovery kinetics. Recovery rates were calculated with single exponential association fits in Prism. We have now modified the text for better clarity.

      E) Please indicate the species used for both functional and Cryo-EM (rat Gluk1 isoform?).

      Thanks for pointing this out. We have now specified in relevant methodology sections that Rattus norvegicus GluK1 and Neto proteins were used in this study.

      F) Please describe the nanodisc reconstitution protocol and how the nanodisc protein was purified, if appropriate.

      The MSP1E3D1 was purified by following the protocol given by the Sligar group in 2014 (doi: 10.1016/S0076-6879(09)64011-8). The nanodisc reconstitution protocol has now been elaborated in the revised manuscript.

      G) Site-directed mutagenesis methodology is incomplete. Please check.

      We have now elaborated this section to include more details.

      Minor concerns:

      (1) Authors state that splice residues are ~30A away from the TM domain. Currently, there is no friendly representation showing the localization of the splice in the structure, besides Fig.6E. The manuscript could benefit itself if authors include a better 3D representation or a scheme to highlight the position of the splice relative to critical domains.

      Thanks for pointing this out. The distance between TRP 381 CA (ATD) and LEU 636 CA (TM3) is 92.10 Å. We have changed the value in the text to ~92 Å.

      Author response image 1.

      (2) Authors mention that mutations in the insert to alanine show normal traffic to the plasma membrane but low current amplitude. Then, I wonder if single-channel conductance, mean open time or open probability is affected by the splice insert. Showing the effects of the insert on single-channel properties would strengthen the manuscript's quality.

      It is a good suggestion. However, as can be observed from our whole cell or outside out patch data, we obtained low peak amplitudes (<50 pA) for many of our receptor-only constructs and also suffered from high SEM for some recordings due to heterogeneity between cells of the same population. The suggestion to study the single channel properties of these receptors is considered for future experiments

      (3) It is unclear how the insert or the mutations specifically affect glutamate- or kainate-induced responses because authors analyze IK/IG ratios only. Maybe authors could consider including an analysis of the role of the insert on specific glutamate- or kainate-induced response to gain insights about ligand selectivity.

      All the values have been included in the excel for raw data. We have included the desensitization kinetics of mutant receptors in the presence of glutamate and compared it to the wild type GluK1-1a. Kainate induced responses were very heterogenous (high SEM for % desensitization) and hence have not been included in the main data.

      (4) Please be consistent with nomenclature along the manuscript to avoid confusion. For example, Are Gluk-1-1 and Gluk-1-1a referring to the same variant?

      GluK1-1 has been used in the abstract and the introduction where we introduce the N-terminal splice variant which either has the 15 residues (termed as GluK1-1) or lacks it (GluK1-2). The C- terminal splice variants for GluK1 are named as “a-d”, with “a” being the smallest Cterminal domain variant. Later in the manuscript, we have used only GluK1-1a terminology to represent the ATD splice variant with shortest C-terminal domain.

      The introduction and spatiotemporal results talk about the GluK1-1 receptors wherein the 

      (5) Legend figure 2: Repeated phrase should be removed. Please check.

      (6) Page 8: "This is similar to the effect observed in GluK1-2 receptors whereby the glutamate EC50 was shown to increase by Neto proteins [Neto1: 34-fold and Neto2: 7.5-fold (Palacios-Filardo et al., 2016) and Neto1/2: 10-30X (Fisher, 2015)]". It seems that values from Fisher's paper are backward. Please correct. 

      (7) Page 9. Second paragraph. Spelling mistake when referring to Fig. 3G.

      Thanks for pointing out the inadvertent errors; we have now corrected all of them.

      (8) Figure 3: The title in Y axis overlaps with the figure. Please check.

      We have corrected the error.

      (9) Page 10: "In addition, K375/379/382H376-E mutant also exhibited a slowdown in the recovery (K375/379/382H376-E: 4.83 {plus minus} 0.31 s P=0.2774) (Figure 4C; Table 1)." Statistical analysis indicates this is not correct. Please tone down this statement. For example: "...mutant also exhibited a trend to a slowdown in the recovery although differences do not reach statistical significance".

      Thanks. We have modified the statement as suggested.

      (10) Page 11: "and a reduction was observed for K375/379/382H376-E receptors (1.17 {plus minus} 0.28 P=0.3733) compared to wild-type (Figure 4D; Table 1)." Same issue as the previous minor comment.

      Thanks. We have modified the statement as suggested.

      (11) Page 11: "We observed that mutants K368-E and K368/375/379/382H376-E, desensitize significantly slower in the presence of Neto1" This statement is not true for K368/375/379/382H376-E mutant. Please correct.

      Thanks. We have modified the statement as suggested and specified the difference.

      (12) Legend Figure 4. Colored asterisks are not clear in the figure. Please check.

      Thanks. The reference to colored asterisks has been removed from the legend as they are not used.

      (13) Representative data shown in Fig 5 sup.2A do not match very well with the final quantification shown in Fig 5A. Please check. Also, the authors state in the result section (page 10) that data shown in Fig. 5A indicate that "GluK1-1a modulation by Neto 1 is influenced by the splice residues". This could be true only for residue K368; however, this is not so obvious since the two mutants containing K368E are inconsistent. Please check and clarify.

      Only representative traces are shown in Fig 5 sup 2 A. However, the quantification shown in Fig 5 A is from multiple cells. We have rechecked all the data and found it to be consistent. We have rewritten this section and modified it for better clarity.

      (14) Figure 6-supplement 2: Please incorporate missing values of MW standards in panel B.

      Thanks. We have modified the figure to include values for MW standards.

      (15) It is not clear the rationale for showing construct C552Y C557V C575S in Fig. 6 sup.3, panel A. This mutant is not mentioned in the manuscript.

      It has been mentioned in the methodology section under “Construct design for expression and purification of rat GluK1-1aEM”. It (C552Y C557V C576S) is one of the constructs used in optimizations that were checked for good protein yields. Based on FSEC protein profiles, we used C552Y, C557V (2X Cys mutant) as GluK1-1aEM, which is mentioned in the same section.

      (16) Fig. 6 sup.4 Not clear what does mean w.r.c. Please specify in the legend.

      With respect to (w. r. t.) has been specified in the manuscript.

      (17) Suggestion to improve data presentation in Fig. 4D and Fig. 3 sup.1B: For easier comparison of IK/IG ratios, representative traces for kainate and glutamate in the same group could be shown using the same Y-scale.

      It has been purposely shown with two different Y-scales due to the differences in peak amplitudes in the presence of glutamate or kainate. 

      (18) Fig. 3 sup.1A: Based on the figure legend, horizontal bars representing the application of glutamate are not consistent with time scale bars. Please, check. In the same figure, panel B, the representative traces shown for GluK-1a-Neto1 are not consistent with IK/IG ratio shown in Fig. 3D.

      Thanks, we have corrected the horizontal bars representing glutamate application. The representative traces shown for GluK-1a-Neto1 were rechecked and are consistent with the IK/IG ratio shown in Fig. 3D.

      (19) I wonder if the authors could discuss the lack of Neto1 effect on the wild type Gluk1-2a channel, as proposed previously.

      Sheng et al., 2015 showed that Neto1 enhances the desensitization onset of GluK1. However, it is unclear which GluK1 splice variants were used in that study. GluK1 has several splice variants, but in the present study, we specifically compared GluK1-1a and 2a. In our case, we did not observe the effect of Neto1 on wild-type GluK1-2a in either of the two techniques (whole cell and outside-out patch) we utilized for our study. However, as can be observed from our data, the GluK1-2a receptor alone shows a faster desensitization kinetics than the previous study (Copits et al., 2011). The differences could stem from different experimental conditions such as constructs, recording conditions used etc.

      Copits BA, Robbins JS, Frausto S, Swanson GT. Synaptic targeting and functional modulation of GluK1 kainate receptors by the auxiliary neuropilin and tolloid-like (NETO) proteins. Journal of Neuroscience. 2011 May 18;31(20):7334-40.

      Sheng N, Shi YS, Lomash RM, Roche KW, Nicoll RA. Neto auxiliary proteins control both the trafficking and biophysical properties of the kainate receptor GluK1. Elife. 2015 Dec 31;4:e11682. doi: 10.7554/eLife.11682. PMID: 26720915; PMCID: PMC4749551.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In their manuscript, Gerlevik et al. performed an integrative analysis of clinical, genetic and transcriptomic data to identify MDS subgroups with distinct outcomes. The study was based on the building of an "immunoscore" and then combined with genotype and clinical data to analyze patient outcomes using multi-omics factor analysis. 

      Strengths: Integrative analysis of RNA-seq, genotyping and clinical data 

      Weaknesses: Validation of the bioinformatic pipeline is incomplete 

      Major comments: 

      (1) This study considered two RNA-seq data sets publicly available and generated in two distinct laboratories. Are they comparable in terms of RNA-seq technique: polyA versus rRNA depletion, paired-end sequencing, fragment length? 

      We want to reemphasize that the main point of this study is not to compare the BMMNC with the HSPC cohort. These datasets are not comparable because they were

      collected from different cell types, and we should not expect them to be matched. We just analysed them in parallel to check how much HSPCs contribute to the molecular signatures we see in BMMNC samples. However, we agree with the reviewer that similar RNA-seq experimental techniques should be employed to control for confounding factors. Here is the information that we found for HSPC and BMMNC RNA-seq studies:

      HSPC RNA-seq cohort: Total RNA was extracted using TRIzol (Thermo Scientific), and Sequencing was performed on an Illumina HiSeq4000 with 100-bp paired-end reads.

      BMMNC RNA-seq cohort: The RNA was extracted with TRIzol reagent (Thermo Scientific). RNA-sequencing libraries were prepared from poly(A)-selected RNA and were sequenced using Illumina HiSeq 2000 or 2500 platform with 100-bp paired-end reads. 

      The only difference between the two cohorts is that one cohort includes total RNAs, whereas the other has polyA-selected RNAs. Since the gene set signatures use the expression of proteincoding genes, which all have polyA tails and are included in total RNA libraries, the analysis will not be affected by total vs. polyA-selected RNA-seq techniques. 

      (2) Data quality control (figure 1): the authors must show in a graph whether the features (dimensions) of factor 1 were available for each BMMNC and CD34+ samples.  

      By features of Factor 1, we think the reviewer means the features with high weights for Factor 1 in BMMNC and CD34+ samples. Figure 2c-d clearly illustrates the important features and their associations with Factor 1 for all samples in both cohorts. The samples are the columns of the two heatmaps.

      (3) How to validate the importance of "immunoscore"? If GSEA of RNA-seq data was performed in the entire cohort, in the SF3B1-mutated samples or SRSF2-mutated samples (instead of patients having a high versus low level of factor 1 shown in Sup Fig. 4), what would be the ranking of Hallmarks or Reactome inflammatory terms among the others? 

      Our GSEA analysis was an attempt to validate the importance of our identified factors. As described in the paper, Factor 1 represents a combination of immunology scores (or  “immunoscores”) in CD34+ cohort. Applying GSEA, we identified upregulation of inflammation related pathways, chemokines, and Neutrophils in patients having high (4th quartile) versus low (1st quartile) levels of Factor 1. Interestingly, sorting patients by Factor 1 resulted in similar pattern based on gene signature scores (Figure 2d).    

      To show that Factor1 generated by MOFA is important and different from known MDS categories such as SF3B1 and SRSF2 mutants, we performed GSEA in SF3B1-mutated vs. SF3B1-WT samples and SRSF2-mutated vs. SRSF2-WT samples in the CD34+ cohort. As shown in Author response image 1, we did not see the upregulation of inflammation and interferon pathways in SF3B1 and SRSF2 mutant MDS.

      Author response image 1.

      GSEA showed no upregulation of inflammation and interferon pathways for SF3B1 and SRSF2 mutant in CD34+ cohort.  

      (4) To decipher cell-type composition of BMMNC and CD34+ samples, the authors used van Galen's data (2019; supplementary table 3). Cell composition is expressed as the proportion of each cell population among the others. Surprisingly, the authors found that the promonocytelike score was increased in SF3B1-mutated samples and not in SRSF2-mutated samples, which are frequently co-mutated with TET2 and associated with a CMML-like phenotype. Is there a risk of bias if bone marrow subpopulations such as megakaryocytic-erythroid progenitors or early erythroid precursors are not considered? 

      We thank the reviewer for their insightful comment about CMML and the high prevalence of SRSF2 mutation (> 45%) in CMML cases. Using single-cell RNA sequencing and high-parameter flow cytometry, Ferrall-Fairbanks et al. (DOI: 10.1158/2643-3230.BCD-21-0217) recently showed that CMML can be classified into three differentiation trajectories: monocytic, megakaryocyte-erythroid progenitor (MEP), and normal-like. One hallmark of monocytic-biased trajectory was the enrichment of inflammatory granulocyte–macrophage progenitor (GMP)-like cells, which we observed through our analysis for SRSF2 mutants (Figure 6a).

      Unfortunately,  van Galen's data does not provide any gene set for MEP, and there is no singlecell RNA-seq atlas for MDS to employ to calculate the MEP score. Also, we compared the Promono-like and GMP-like gene sets from van Galen's data, and we could not find any overlap, meaning that Promono-like is not specific enough to capture the signatures coming from the more differentiated progenitors such as GMPs. Therefore, as described in the paper, we focused on GMP-like rather than Promono-like.

      (5) Figures 2a and 2b indicated that the nature of retrotransposons identified in BMMNC and CD34+ was dicerent. ERVs were not detected in CD34+ cells. Are ERVs not reactivated in CD34+ cells? Is there a bias in the sequencing or bioinformatic method?  

      As described above, the two cohorts' sequencing methods, read length, etc., are identical.

      CD34+ RNA-seq is total RNA-seq that includes both polyA and non-polyA RTE transcripts.

      Therefore, the chance of bias and missing RTE signatures in CD34+ cohort is very low. L1 and Alu, which are shared between the two cohorts, are the two RTE families that are still active and make new insertions in humans. Our interpretation is that ERV activation in BM is associated with immune cells. As shown by Au et al. (DOI: 10.1016/j.ccell.2021.10.001), several ERV loci had expression in purified immune cell subsets in renal cell carcinoma samples, potentially explaining ERV upregulation in tumours responding to treatment as those biopsies had increased tumour infiltration.

      (6) What is the impact of factor 1 on survival? Is it dicerent between BMMNC and CD34+ cells considering the distinct composition of factor 1 in CD34+ and BMMNC? 

      As shown in Table 1, Factor 1 in the BMMNC cohort is associated with overall survival (P-val < 0.05) when we did multivariate analysis but not univariate analysis. We did not observe any association between Factor 1 and event-free survival in the BMMNC cohort. Also, The 10 factors identified by MOFA in BM CD34+ cohort did not show any significance associated with MDS overall survival (Supplementary Table 5). 

      (7) In Figure 1e, genotype contributed to the variance of in the CD34+ cell analyses more importantly than in the BMMNC. Because the patients are dicerent in the two cohorts, dicerences in the variance could be explained either by a greater variability of the type of mutations in CD34 or an increased frequency of poor prognosis mutations in CD34+ compared to BMMNC. The genotyping data must be shown.  

      The genotype has already been reported in Supplementary Table 2. In fact, the number of inspected genes was much higher in the BMMNC cohort (17 genes) compared to the CD34+ cohort (3 genes). Therefore, we have more significant variability of the type of mutations in the BMMNC cohort compared to the CD34+ cohort. For the CD34+ cohort, we only had mutations for three spliceosome genes, where most cases (n=28) were SF3B1 mutants with good prognosis. We think that the result makes sense because the less genetic variability, the more homogenous groups and the more chance that one factor or a group of factors can explain the genetic variance.   

      (8) Fig. 2a-b: Features with high weight are shown for each factor. For factor 9, features seemed to have a low weight (Fig. 1b and 1c). However, factor 9 was predictive of EFS and OS in the BMMNC cohort. What are the features driving the prognostic value of factor 9? 

      As shown in Figure 3b, The main features are RTE expression from LTR:ERV1, SINE:MIR, and SINE:Alu family.  

      (9) The authors also provided microarray analyses of CD34+ cell. It could be interesting to test more broadly the correlation between features identified by RNA-seq or microarrays. 

      The microarray data did not come with any genetic information or clinical data except survival information. Therefore, we could not apply MOFA on Microarray data. However, we did generate gene signature scores from Microarray data and investigated the relationship between inflammatory chemokines and cytokines, and IFN-I signature scores with MDS survival (Figure 3c and 4c).    

      (10) The authors should discuss the relevance of immunosenescence features in the context of SRSF2 mutation and extend the discussion to the interest of their pipeline for patient diagnosis and follow up under treatments. 

      We have added the below text to the discussion:

      Recent studies have shown that the expression of programmed death-ligand 1 (PD-L1) protein is significantly elevated in senescent cells (DOIs: 10.1128/mcb.00171-22, 10.1172/JCI156250, 10.1038/s41586-022-05388-4). Increased PD-L1 protein levels protect senescent cells from being cleared by cytotoxic immune cells that express the PD-1 checkpoint receptor. In fact, activation of the PD-1 receptor inhibits the cytotoxic capabilities of CD8 + T and NK cells, increasing immunosenescence.   

      Notably, patients with MDS who possess particular somatic mutations, such as those in the TP53, ASXL1, SETBP1, TET2, SRSF2, and RUNX1 genes, have an increased propensity to react favourably to PD-1/PD-L1 inhibitors (DOIs: 10.1111/bjh.17689, https://doi.org/10.1182/blood2020-141100) confirming that many cellular and molecular mechanisms, known to promote cellular senescence, including alteration of splicing machinery, are crucial stimulators of the expression of PD-L1 protein. Interestingly, in our analysis, we also observed a correlation between the senescence gene signature score and the expression of the PD-L1 gene in CD34+ cells (Supplementary Figure 7), supporting the previous findings linking PD-L1 gene expression to cellular senescence.

      The immunology and ageing features extracted from the MDS transcriptomic data used in our analysis pipeline can enhance the conventional risk-scoring systems for MDS by providing new insights into this disease, particularly in the context of inflammation and ageing. For some patients, the clinical and genetic features may remain relatively the same until follow-up. Still, the transcriptomic features might differ considerably from the baseline diagnosis, affecting the course of treatment.    

      Reviewer #2 (Public Review): 

      The authors performed a Multi-Omics Factor Analysis (MOFA) on analysis of two published MDS patient cohorts-1 from bone marrow mononuclear cells (BMMNCs) and CD34 cells (ref 17) and another from CD34+ cells (ref 15) --with three data modalities (clinical, genotype, and transcriptomics). Seven different views, including immune profile, inflammation/aging, Retrotransposon (RTE) expression, and cell-type composition, were derived from these modalities to attempt to identify the latent factors with significant impact on MDS prognosis. 

      SF3B1 was found to be the only mutation among 13 mutations in the BMMNC cohort that indicated a significant association with high inflammation. This trend was also observed to a lesser extent in the CD34+ cohort. The MOFA factor representing inflammation showed a good prognosis for MDS patients with high inflammation. In contrast, SRSF2 mutant cases showed a granulocyte-monocyte progenitor (GMP) pattern and high levels of senescence, immunosenescence, and malignant myeloid cells, consistent with their poor prognosis. Also, MOFA identified RTE expression as a risk factor for MDS. They proposed that this work showed the efficacy of their integrative approach to assess MDS prognostic risk that 'goes beyond all the scoring systems described thus far for MDS'. 

      Several issues need clarification and response: 

      (1) The authors do not provide adequate known clinical and molecular information which demonstrates prognostic risk of their sample cohorts in order to determine whether their data and approach 'goes 'beyond all the scoring systems described thus far for MDS'. For example, what data have the authors that their features provide prognostic data independent of the prior known factors related to prognosis (eg, marrow blasts, mutational, cytogenetic features, ring sideroblasts, IPSS-R, IPSS-M, MDA-SS)? 

      We agree with the reviewer that we did not generate a new cumulative risk score and compare it with the conventional risk scores for MDS. However, we identified individual MOFA factors, which are risk or protective factors for MDS, based on survival analysis in the BMMNC cohort. One reason that we did not generate our independent, cumulative score and compare it with other scores was that we did not receive any conventional risk score for the BMMNC cohort. However, we had access to all the clinical and genetic variables from the BMMNC cohort (except for three patients) that were required to calculate IPSS-R; hence, we calculated the IPSS-R in our resubmission for the BMMNC cohort. We made three IPSS-R risk categories by combining low and very low as low risk, and high and very high as high risk, and keeping intermediate as intermediate risk. Our survival analysis of these three categories showed a clear match between IPSS-R score and MDS survival (Author response image 2a).

      We then investigated the relationship between factors 2, 4, and 9 from MOFA with three IPSS-R risk groups.  Integration of IPSS-R risk groups with factor values confirmed the finding in the manuscript that Factors 4 and 9 generally exert a protective influence over the MDS risk, whilst higher levels of Factor 2 predict a high-risk MDS (Author response image 2b). However, we see so many outliers in all three factors, indicating that some patients were assigned to the wrong IPSS-R categories because IPSS-R calculation is based on clinical and genetic variables and does not include the transcriptomics data for coding and non-coding genomic regions. 

      Author response image 2.

      Comparison of IPSS-R risk categories and MOFA risk and protective factors.

      (2) A major issue in analyzing this paper relates to the specific patient composition from whom the samples and data were obtained. The cells from the Shiozawa paper (ref 17) is comprised of a substantial number of CMML patients. Thus, what evidence have the authors that much of the data from the BMMNCs from these patients and mutant SRSF2 related predominantly to their monocytic dicerentiation state?  

      We thank the reviewer for the insightful comment about the monocytic differentiation state of CMML and SRSF2 mutant cases. The BMMNC cohort has 11 CMML and 17 SRSF2 mutant cases, of which six are shared between the two groups. We have divided the patients into four groups: CMML only, SRSF2 mutant only, CCML and SRSF2 mutant, and others. We have generated boxplots for all cellular composition gene signature scores for these groups and compared the scores between these groups. As explained above, Ferrall-Fairbanks et al. (DOI: 10.1158/2643-3230.BCD-21-0217) recently showed that CMML can be classified into three differentiation trajectories: monocytic, megakaryocyte-erythroid progenitor (MEP), and normal-like. One hallmark of monocytic-biased trajectory was the enrichment of inflammatory granulocyte–macrophage progenitor (GMP)-like cells, which we observed through our analysis for the CMML cases with SRSF2 mutation (Author response image 3.).

      Author response image 3.

      Cellular composition gene signature scores for CMML and SRSF2 mutant versus other cases. CMML cases with SRSF2 mutation show a significant higher level of GMP and GMP-like scores compared to other MDS cases.  

      (3) In addition, as the majority of patients in the Shiozawa paper have ring sideroblasts (n=59), thus potentially skewing the data toward consideration mainly of these patients, for whom better outcomes are well known.  

      We disagree with the reviewer. We used 94 BMMNC samples from Shiozawa’s paper, of which 19 cases had Refractory Anemia with Ring Sideroblasts (RARS), 4 cases had Refractory Anemia with Ring Sideroblasts and thrombocytosis (RARS-T), and 5 cases had Refractory cytopenia with multilineage dysplasia and ring sideroblasts (RCMD-RS). In total, we had 28 cases (~30%) with Ring Sideroblasts (RS), which are not large enough to skew the data.

      (4) Further, regarding this patient subset, what evidence have the authors that the importance of the SF3B1 mutation was merely related to the preponderance of sideroblastic patients from whom the samples were analyzed? 

      We had 34 SF3B1 mutant cases, of which 25 had Ring Sideroblasts (RS). The total number of cases with RS in the BMMNC cohort was 28. Therefore, the BMMNC cohort is not an RSdominant cohort, and RS cases did not include all SF3B1 mutants. Furthermore, it was recently shown by Ochi et al. (DOI: 10.1038/s41598-022-18921-2) that RS is a consequence of SF3B1K700E mutation, and it is not a cause to affect the SF3B1 importance.

      (5) An Erratum was reported for the Shiozawa paper (Shiozawa Y, Malcovati L, Gallì A, et al. Gene expression and risk of leukemic transformation in myelodysplasia. Blood. 2018 Aug 23;132(8):869-875. doi: 10.1182/blood-2018-07-863134) that resulted from a coding error in the construction of the logistic regression model for subgroup prediction based on the gene expression profiles of BMMNCs. This coding error was identified after the publication of the article. The authors should indicate the ecect this error may have had on the data they now report.  

      Thank you for bringing this important issue to our attention. The error resulted from a mistake in the construction of the logistic regression model for subgroup prediction based on the gene expression profiles of BMMNCs. However, this issue does not affect our result because we analysed the expression data from scratch and generated our own gene signature scores. Also, the error has no impact on the genetics and clinical information that we received from the authors.

      (6) What information have the authors as to whether the dicering RTE findings were not predominantly related to the dicerentiation state of the cell population analyzed (ie higher in BM MNCs vs CD34, Fig 1)? What control data have the authors regarding these values from normal (non-malignant) cell populations? 

      As described above, L1 and Alu, the two RTE families shared between the two cohorts, are still active and make new insertions in humans (Figure 2.a-b). Our interpretation is that ERV activation in BM is associated with immune cells. This interpretation is further supported by the findings of Au et al. (DOI: 10.1016/j.ccell.2021.10.001), where several ERV loci had expression in purified immune cell subsets in renal cell carcinoma samples. 

      Unfortunately, none of these two cohorts had normal (non-malignant) cell populations. We think that the MOFA unbiased way of modelling the heterogeneity is su@icient to capture the RTE derepressed phenotype of a subset of MDS cases compared to others, and we do not need normal cases to further support the finding. 

      (7) The statement in the Discussion regarding the ecects of SRSF2 mutation is speculative and should be avoided. Many other somatic gene mutations have known stronger ecects on prognosis for MDS. 

      One aim of this study is to identify specific immune signatures associated with SRSF2 and SF3B1 mutations, which are highly prevalent in MDS. Although other mutations, such as TP53, may have a stronger correlation with poor survival, numerous studies have demonstrated a clear link between SRSF2 mutations and poor prognosis.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors employ a combined proteomic and genetic approach to identify the glycoprotein QC factor malectin as an important protein involved in promoting coronavirus infection. Using proteomic approaches, they show that the non-structural protein NSP2 and malectin interact in the absence of viral infection, but not in the presence of viral infection. However, both NSP2 and malectin engage the OST complex during viral infection, with malectin also showing reduced interactions with other glycoprotein QC proteins. Malectin KD reduce replication of coronaviruses, including SARS-COV2. Collectively, these results identify Malectin as a glycoprotein QC protein involved in regulating coronavirus replication that could potentially be targeted to mitigate coronavirus replication.

      Overall, the experiments described appear well performed and the interpretations generally reflect the results. Moreover, this work identifies Malectin as an important pro-viral protein whose activity could potentially be therapeutically targeted for the broad treatment of coronavirus infection. However, there are some weaknesses in the work that, if addressed, would improve the impact of the manuscript.

      Notably, the mechanism by which malectin regulates viral replication is not well described. It is clear from the work that malectin is a pro-viral protein in the work presented, but the mechanistic basis of this activity is not pursued. Some potential mechanisms are proposed in the discussion, but the manuscript would be strengthened if additional insight was included. For example, does the UPR activated to higher levels in infected cells depleted of malectin? Do glycosylation patterns of viral (or non-viral) proteins change in malectin-depleted cells? Additional insight into this specific question would significantly improve the manuscript.

      We concur with the reviewer that the mechanism by which Malectin regulates viral replication remains unclear. It will be worth pursuing the molecular mechanisms underlying this phenotype in future studies. Our existing proteomics data sets can potentially offer additional insight into the questions posed here. Namely, we plan to analyze levels of protein markers of the UPR and other ER stress pathways in infected cells depleted of Malectin in our existing global proteomics data set. In addition, we will attempt to compare glycosylation patterns of endogenous proteins in Malectin-depleted cells. One caveat to this will be that it may be difficult to differentiate between spontaneous chemical deamidation and enzymatic PNGase F mediated deamidation.

      Further, the evidence for increased interactions between OST and malectin during viral infection is fairly weak, despite being a major talking point throughout the manuscript. The reduced interactions between malectin and other glycoproteostasis QC factors is evident, but the increased interactions with OST are not well supported. I'd recommend backing off on this point throughout the text, instead, continuing to highlight the reduced interactions.

      We note that the fold change increase of OST interactions with malectin are small compared to the fold change decrease of other glycoproteostasis factors. If this modest increase is consistent across replicates, we believe this bolsters the claim that it is a noteworthy change. However, if not, we can modify the text as suggested to emphasize the reduced interactions.

      I was also curious as to why non-structural proteins, nsp2 and nsp4, showed robust interactions with host proteins localized to both the ER and mitochondria? Do these proteins localize to different organelles or do these interactions reflect some other type of dysregulation? It would be useful to provide a bit of speculation on this point.

      We also find these ER and mitochondrial protein interactions curious, which we initially reported on (Davies, Almasy et al. 2020 ACS Infectious Diseases). In this prior report, we found that when expressed in HEK293T cells, SARS-CoV-2 nsp2 and nsp4 have partial localization to mitochondrial-associated ER membranes (MAMs), as determined by subcellular fractionation. Given that malectin has also been shown to have MAMs localization (Carreras-Sureda, et al. 2019 Nature Cell Biology), we can insert some speculation on this in the Discussion section.

      Again, the overall identification of malectin as a pro-viral protein involved in the replication of multiple different coronaviruses is interesting and important, but additional insights into the mechanism of this activity would strengthen the overall impact of this work.

      Reviewer #2 (Public Review):

      Summary:

      A strong case is presented to establish that the endoplasmic reticulum carbohydrate binding protein malectin is an important factor for coronavirus propagation. Malectin was identified as a coronavirus nsp2 protein interactor using quantitative proteomics and its importance in the viral life cycle was supported by using a functional genetic screen and viral assays. Malectin binds diglucosylated proteins, an early glycoform thought to transiently exist on nascent chains shortly after translation and translocation; yet a role for malectin has previously been proposed in later quality control decisions and degradation targeting. These two observations have been difficult to reconcile temporally. In agreement with results from the Locher lab, the malectin-interactome shown here includes a number of subunits of the oligosaccharyltransferase complex (OST). These results place malectin in close proximity to both the co-translational (STT3A or OST-A) and post-translational (STT3B or OST-B) complexes. It follows that malectin knockdown was associated with coronavirus Spike protein hypoglycosylation.

      Strengths:

      Strengths include using multiple viruses to identify interactors of nsp2 and quantitative proteomics along with

      multiple viral assays to monitor the viral life cycle.

      Weaknesses:

      Malectin knockdown was shown to be associated with Spike protein hypoglycosylation. This was further supported by malectin interactions with the OSTs. However, no specific role of malectin in glycosylation was discussed or proposed.

      We will emphasize our hypotheses on this point in the discussion and add a summary figure to highlight the specific role of malectin.

      Given the likelihood that malectin plays a role in the glycosylation of heavily glycosylated proteins like Spike, it is unfortunate that only 5 glycosites on Spike were identified using the MS deamidation assay when Spike has a large number of glycans (~22 sites). The mass spec data set would also include endogenous proteins. Were any heavily glycosylated endogenous proteins hypoglycosylated in the MS analysis in Fig 5D?

      We plan to interrogate this question in our existing MS deamidation proteomics data set as outlined above.

      The inclusion of the nsp4 interactome and its partial characterization is a distraction from the storyline that focuses on malectin and nsp2.

      We believe the nsp4 comparative interactome and functional genomics data offers a rich resource for further functional investigation by others, if made public. While we found the malectin and nsp2 storyline the most compelling to pursue, we believe the inclusion of the nsp4 data strengthens the overall approach, in agreement with Reviewer #3’s comments.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Davies and Plate set out to discover conserved host interactors of coronavirus non-structural proteins (Nsp). They used 293T cells to ectopically express flag-tagged Nsp2 and Nsp4 from five human and mouse coronaviruses, including SARS-CoV-1 and 2, and analyzed their interaction with host proteins by affinity purification mass-spectrometry (AP-MS). To confirm whether such interactors play a role in coronavirus infection, the authors measured the effects of individual knockdowns on replication of murine hepatitis virus (MHV) in mouse Delayed Brain Tumor cells. Using this approach, they identified a previously undescribed interactor of Nsp2, Malectin (Mlec), which is involved in glycoprotein processing and shows a potent pro-viral function in both MHV and SARS-CoV-2. Although the authors were unable to confirm this interaction in MHV-infected cells, they show that infection remodels many other Mlec interactions, recruiting it to the ER complex that catalyzes protein glycosylation (OST). Mlec knockdown reduced viral RNA and protein levels during MHV infection, although such effects were not limited to specific viral proteins. However, knockdown reduced the levels of five viral glycopeptides that map to Spike protein, suggesting it may be affected by Mlec.

      Strengths:

      This is an elegant study that uses a state-of-the-art quantitative proteomic approach to identify host proteins that play critical roles in viral infection. Instead of focusing on a single protein from a single virus, it compares the interactomes of two viral proteins from five related viruses, generating a high confidence dataset. The functional follow-ups using multiple live and reporter viruses, including MHV and CoV2 variants, convincingly depict a pro-viral role for Mlec, a protein not previously implicated in coronavirus biology.

      Weaknesses:

      Although a commonly used approach, AP-MS of ectopically expressed viral proteins may not accurately capture infection-related interactions. The authors observed Mlec-Nsp2 interactions in transfected 293T cells (1C) but were unable to reproduce those in mouse cells infected with MHV (3C). EIF4E2/GIGYF2, two bonafide interactors of CoV2 Nsp2 from previous studies, are listed as depleted compared to negative controls (S1D). Most other CoV2 Nsp2 interactors are also depleted by the same analysis (S1D). Previously reported MERS Nsp2 interactors, including ASCC1 and TCF25, are also not detected (S1D). Furthermore, although GIGYF2 was not identified as an interactor of MHV Nsp2/4 in human cells (S1D), its knockdown in mouse cells reduced MHV titers about 1000 fold (S4). The authors should attempt to explain these discrepancies.

      We plan to address these discrepancies with further elaboration in the text.

      More importantly, the authors were unable to establish a direct link between Mlec and the biogenesis of any viral or host proteins, by mass-spectrometry or otherwise. Although it is clear that Mlec promotes coronavirus infection, the mechanism remains unclear. Its knockdown does not affect the proteome composition of uninfected cells (S15B), suggesting it is not required for proteome maintenance under normal conditions. The only viral glycopeptides detected during MHV infection originated from Spike (5D), although other viral proteins are also known to be glycosylated. Cells depleted for Mlec produce ~4-fold less Spike protein (4E) but no more than 2-fold less glycosylated spike peptides (5D), compounding the interpretation of Mlec effects on viral protein biogenesis. Furthermore, Spike is not essential for the pro-viral role of Mlec, given that Mlec knockdown reduces replication of SARS-CoV-2 replicons that express all viral proteins except for Spike (6A/B).

      These are all important points. We plan to acknowledge some of these compounding factors in the Discussion.

      Any of the observed effects on viral protein levels could be secondary to multiple other processes. Interventions that delay infection for any reason could lead to an imbalance of viral protein levels because Spike and other structural proteins are produced at a much higher rate than non-structural proteins due to the higher abundance of their cognate subgenomic RNAs. Similarly, the observation that Mlec depletion attenuates MHV-mediated changes to the host proteome (S15C/D) can also be attributed to indirect effects on viral replication, regardless of glycoprotein processing. In the discussion, the authors acknowledge that Mlec may indirectly affect infection through modulation of replication complex formation or ER stress, but do not offer any supporting evidence. Interestingly, plant homologs of Mlec are implicated in innate immunity, favoring a more global role for Mlec in mammalian coronavirus infections.

      We plan to interrogate our existing proteomics data for signatures of ER stress in Mlec-depleted cells (as outlined above).

      Finally, the observation that both Nsp2 (3C) and Mlec (3E/F) are recruited to the OST complex during MHV infection neither support nor refute any of these alternate hypotheses, given that Mlec is known to interact with OST in uninfected cells and that Nsp2 may interact with OST as part of the full length unprocessed Orf1a, as it co-translationally translocates into the ER. Therefore, the main claims about the role of Mlec in coronavirus protein biogenesis are only partially supported.

      We plan to acknowledge this alternative hypothesis in the Discussion.

    1. Author response:

      We are grateful to the reviewers for their insightful comments on our manuscript and are encouraged by their overall favorable assessments. For the eLife Version of Record, we will make the following revisions to address reviewers’ comments and broaden the applicability of our technique in the zebrafish research community:

      (1) We will elaborate on various facets with additional details:

      a) Experimental conditions | We will specify the transgenic background, injected plasmids, larval stage, viral type, and viral titer clearly for each related experiment.

      b) Experimental methods | We will depict in more details on how to inject the virus into a target area in larval zebrafish.

      c) Data analysis | We will provide more detailed information on the paired electrical stimulation-calcium imaging study and on identifying connected Purkinje cells and granule cells during circuit reconstruction.

      d) Discussion | We will elaborate on trans-synaptic specificity concerning glial cell labeling, toxicity related to viral dose and temperature, and the potential issue of secondary starters and multi-step circuit tracing.

      (2) We will address the issue of glial cell labeling by adding more discussion and characterization, including potential mechanisms and implications, cell distribution, labeling progress, survival, and capability for viral transmission as starter cells.

      (3) We will modify the text of the manuscript to clarify additional points raised by the reviewers.

      (4) We will provide public repositories for accessing both the items and information on zebrafish lines, plasmids, viral vectors, and reconstructed data generated in this study.

      In the end, we will submit full responses to the reviewer comments along with the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

      Thank you for recognizing the sophistication and clinical relevance of our mouse model for acute retinal artery occlusion. We are grateful for your supportive feedback.

      Public reviews:

      (1) Response to Reviewer #1: 

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block the blood supply to the mouse inner retina, which mimics clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two-time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      We sincerely appreciate your detailed and positive feedback. These evaluations are invaluable in highlighting the significance and impact of our work. Thank you for your thoughtful and supportive review.

      Weaknesses:

      It would be beneficial to the manuscript and the readers if the authors could improve the English of this manuscript by correcting obvious grammar errors, eliminating many of the acronyms that are not commonly used by the field, and providing a reason why this complicated but clever surgery procedure was designed and a summary table with the time course of all the morphological, functional, cellular, and transcriptome changes associated with this model.

      Thank you for your thorough review of the manuscript. We sincerely apologize for any grammatical errors resulting from our English language proficiency and have taken the necessary steps to polish the article. Additionally, we have heeded your advice and reduced the use of field-specific acronyms to enhance readability for both the manuscript and its readers.

      Regarding the rationale behind the design of the UPOAO model, we have provided a description in Introduction section. Our group focuses on the research of pathogenesis and clinical treatment for RAO. The absence of an accurate mouse model simulating the retinal ischemic process has hampered progress in developing neuroprotective agents for RAO. To better simulate the retinal ischemic process and possible ischemia-reperfusion injury following RAO, we developed a novel vascular-associated mouse model called the unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) model. We drew inspiration from the widely employed middle cerebral artery occlusion (MCAO) model, commonly used in cerebral ischemic injury research, which guided the development of the UPOAO model.

      We appreciate your valuable suggestion regarding the inclusion of a summary table outlining the time course of morphological, functional, cellular, and transcriptome changes associated with this model. To address this, we intend to include a supplementary table at the end of the article (Table. S2 Summary Table), which will offer a comprehensive overview of the experimental results, thereby aiding in clarity and interpretation.

      Once again, we thank you for your insightful comments and suggestions, which have greatly contributed to the improvement of our manuscript.

      (2) Response to Reviewer #2: 

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes in major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach to studying retinal artery occlusion. The study is very comprehensive.

      We greatly appreciate your positive assessment of our work and are encouraged by your recognition of its significance.

      Weaknesses:

      Some statements are incorrect and confusing. It would be helpful to review and clarify these to ensure accuracy and improve readability.

      We sincerely appreciate your meticulous review of the manuscript. Taking into account your valuable feedback, we will thoroughly address the inaccuracies identified in the revised version. Additionally, we will commit to polishing the article to ensure improved readability. We apologize for any confusion caused by these inaccuracies and genuinely thank you for bringing them to our attention.

      Recommendations For The Authors:

      Reviewer #1:

      (1) Response to comment:

      The conclusions of this paper are mostly well supported by clear images and convincing data analysis, but some aspects of image presentation and additional data analysis may be needed to strengthen the manuscript.

      We sincerely appreciate your positive assessment of our work and your recognition of the clear images and convincing data analysis supporting our conclusions. Your constructive feedback on enhancing the clarity of our manuscript's image presentation and additional data analysis is highly valued. In response to your suggestions, we have taken steps to improve readability by removing or correcting uncommon acronyms from certain images. We have also conducted further data analysis to provide more comprehensive insights. Thank you for your guidance in improving the quality of our manuscript.

      (2) Response to recommendation (1):

      In Results 3.1 or in Method 2.2: please explain why this combination of silicone wire embolization and carotid artery ligation was chosen to replace previous models such as UCCAO? What are the advantages? And why the silicone wire embolus was inserted through ECA instead of inserting into CCA directly? The cleverly designed surgical procedure is very impressive but the reasoning behind it is not obvious and needs more explanation.

      Thank you for your valuable feedback.

      In the introduction, we briefly describe the rationale for developing the UPOAO model to simulate acute ischemia-reperfusion of retinal artery occlusion (RAO). Previous common retinal ischemia model had certain shortcomings. For example, in the HIOP model, which is often used for simulating glaucoma, the ischemic factor of interrupted retinal blood flow may be amplified due to the dual effects of IOP-induced mechanical stress [1, 2] and vascular ischemia due to normal saline perfusion in the anterior chamber. In the UCCAO model, recanalization is performed after ligation of the carotid blood vessels, and the retina communicates with the blood vessels in the brain, resulting in retinal hypoperfusion. The retina ischemia in UCCAO is a chronical process, for example, the retina became thinner at week 10 and week 15 [3], while RAO is an acute total retinal ischemic disease. Therefore, it is critically important to develop a simple mouse model that can simulate acute retinal ischemia and reperfusion injury in RAO patients.

      Various models have been developed for ischemic stroke research, with the endoluminal suture model being the most employed method for middle cerebral artery occlusion (MCAO). In this model, filaments are introduced through either the external or internal carotid artery and advanced into the middle cerebral artery, causing temporary blood flow blockage for a specific duration. This method has been extensively employed in studies involving transient occlusion [4]. Among the MCAO models, the Koizumi method (occlusion from the common carotid artery (CCA) to the middle cerebral artery (MCA)) and the Longa method (occlusion from the external carotid artery (ECA) to the MCA) are frequently used. Among these two methods, the Longa method is more widely utilized in research studies. The Longa method has a much lower mortality rate post-surgery (26%) than that of the Koizumi (44%) [5]. The MCAO model induces substantial infarct areas and significantly contributes to advancements in stroke research, including investigations into blood-brain barrier disruption and inflammatory responses to ischemia.

      RAO is considered a form of ocular stroke. Inspired by the MCAO model, we have employed a silicone wire embolus to induce acute interruption of blood flow to the retina. This approach enables the investigation of pathophysiological processes associated with RAO, providing valuable insights into the understanding of this condition. We have clarified these points in the revised manuscript (line 129).

      The reasoning behind inserting the silicone wire embolus through the ECA instead of directly into the CCA is twofold:

      (1) Convenience and avoidance of heavy bleeding and mortality. Inserting the silicone wire embolus requires creating an opening in the artery, which then needs to be ligated at both ends after the silicone wire embolus is removed to prevent excessive bleeding. The ECA's ability to form a straight line with the ICA after folding makes it more convenient for the entry and removal of the silicone wire embolus. This procedure is more convenient to perform on the ECA. The blood flow to the CCA can be restored after the plug is removed from ECA, ensuring that the blood supply to the brain through the CCA is not affected.

      (2) Preservation of reperfusion process. If the silicone wire embolus were inserted directly into the CCA, the ends of the CCA opening would need to be ligated after the silicone wire embolus is removed. This would result in a lack of reperfusion process after retinal ischemia. To enable the reperfusion process, the decision was made to open the ECA instead.

      We have clarified these points in the revised manuscript to better explain the rationale behind our methodology (line 139). Thank you for prompting this important clarification, which we believe will enhance the understanding of our readers.

      (3) Response to recommendation (2):

      Did the UPOPA actually block OA, including both the retinal (CRA) and choroidal (SPCA and LPCA) blood supply? If so, why does it seem only the inner retina was affected but not the outer retina?

      Thank you for your question. We agree with you that the UPOAO model blocks OA, which includes retinal and choroidal vessels. Our experimental results primarily indicate damage to the inner retinal layer within 7 days of reperfusion. For example, OCT and HE staining showed significant thinning of the inner retina after 60 minutes of ischemia followed by 7 days of reperfusion (Figure 4). At the same time, the b-wave amplitudes were decreases, usually indicating damage to the inner layer of the retina. However, the outer retina was seemed not affected by 60 minutes of ischemia based on the results of OCT, HE and immunofluorescence.

      Inner layer of the retina was known to show the highest sensitivity to hypoxic challenges [6], whereas the outer retinal layer was more resistant to hypoxic stress [7]. The possible reason for these results was that the outer layer like photoreceptors is more tolerant against ischemia than inner layer of the retina. Previous studies of retinal ischemia-reperfusion models supported this assumption. In the UCCAO model, the b-wave was more affected than the a-wave. Decreases in the amplitudes of OPs, scotopic b-wave, and photopic b-wave were consistently observed on week 4 after UCCAO, while the amplitude of scotopic a-wave did not dramatically change [8]. Prolonged ischemia, such as permanent ischemia, led to photoreceptor cell degradation, as seen in Stevens et al.'s report of photoreceptors loss 3 months after permanent ligation of both common carotid arteries in bilateral common carotid artery occlusion (BCCAO) [9]. In the HIOP model, the GCL and INL reacted sensitively to ischemic processes. A significant thinning of the GCL as early as 6 hours after 60 minutes of ischemia [10]. Horizontal cells and photoreceptors remained mostly unaffected, while most RGCs and several amacrine cell subtypes disappear [11, 12].

      Our study revealed the changes that occurred within 60 minutes of ischemia and the first 7 days of reperfusion in the UPOAO model. One possibility was that the ischemia duration in our model was not long enough to affect the outer retinal cells. Furthermore, the observation time point for reperfusion was not long enough to see the structure damage and visual dysfunctions in the outer retinal layer. As we have explained in the manuscript, further exploration is needed to understand changes induced by longer ischemia duration and reperfusion periods. Revealing the damage to retinal structure and function during longer ischemia time will be an emphasis direction for our further research.

      (4) Response to recommendation (3):

      Better to only use well-accepted acronyms and remove those that are rarely seen in other publications, such as IMRL, MRL, HIOP, TRT, etc.

      Thank you for your valuable feedback. In our manuscript, we utilized the Spectralis HRA+OCT device (Heidelberg) to capture the retinal images. However, the resulting image layering did not adequately distinguish each retinal layer clearly. To address this limitation, we referred to a clinical OCT stratification approach in RVO and divided the retina into the inner, middle, and outer layers [16]. We acknowledge that this hierarchical description is not commonly used and have therefore followed your recommendation to remove these rare acronyms and instead employ the layer structure abbreviation along with the plus sign. The methods and results have been revised accordingly (line 213, line 368, Figure 4 and Figure S2).

      In addition, for the HIOP model, it is also known as the IR or RIRI model [17-19], and the pathophysiological process of retinal ischemia-reperfusion injury (IRI) is usually used to represent this type of anterior chamber perfusion model. To avoid confusion between the pathophysiological process of ischemia-reperfusion studied in this paper and the common model of high intraocular pressure, we have consistently referred to it as the HIOP model, an abbreviation that is cited in many references [20-22].

      Thanks again for the suggestion. We apologize for any confusion caused by the use of abbreviations and have made the necessary corrections in the manuscript. We have also strengthened the details of OCT layering in the images to enhance readability for our audience.

      (5) Response to recommendation (4):

      Figure 3F, G: What do the OP changes mean? What retina cell dysfunction leads to OP changes? Is there RGC-relevant visual function readout to correlate with RGC death?

      Oscillatory potentials (OPs) are important components of the electroretinogram (ERG). While the precise origin of OPs remains unclear, they are generally believed to be generated from the inner retinal layer, specifically involving bipolar cells, amacrine cells and ganglion cells [23]. OPs are sensitive indicators of retinal ischemic effects and can detect dysfunction before alterations in the b-waves occur [24-26] (We have added these statements at line 358). In this research, the reduction of OPs indicated dysfunction in the inner retinal layer and retinal ischemia.

      The function of RGCs can be non-invasively assessed by using various ERG technique that emphasize the activity of inner retina neurons, including OPs of multifocal ERG (mfERG), photopic negative response (PhNR) in mfERG, pattern electroretinogram (PERG), negative Scotopic Threshold Response (nSTR) [27]. Among these indicators, the PERG appears to be more specifically related to the presence of functional RGCs. However, the complexity of electrophysiological sources and species-specific differences in RGCs characteristics should also be considered. In addition, visual evoked potentials (VEP) can assess the function of visual signaling in the whole visual pathway from RGC axons to the visual cortex of the brain [28, 29]. Unfortunately, due to the unavailability of specific equipment required for evaluating RGCs function, we encountered limitations in conducting a comprehensive assessment in this study. This limitation emphasizes the importance of future studies incorporating RGCs evaluation to provide a more comprehensive understanding of visual pathway functionality and its implications, considering indicators such as PERG and PhNR.

      Thank you for your careful review and insightful questions.

      (6) Response to recommendation (5):

      Figure 4B: RNFL/GCL/IPL normally called GCC (ganglion cell complex).

      We appreciate your helpful recommendation regarding the abbreviation GCC (ganglion cell complex) for the combination of RNFL, GCL, and IPL. We have updated this terminology in the revised manuscript (line 213 and Figure 4).

      (7) Response to recommendation (6):

      Figure 4 A-F: Normally a circular OCT image surrounding the optic nerve head is preferred to measure retina thickness. If in these figures, all the OCT images are from the same location, it may be acceptable, but need to provide imaging details on how these OCT planes are selected and what has been done to make sure the same locations were selected for comparison.

      We agree with your comment on OCT imaging that the retina is usually captured OCT images surrounding the optic nerve head. In this study, our goal was to assess both the thickness of the peripheral retina and the retina near the optic nerve head. To achieve this, we considered the optic nerve head as the apex of the selected field of view (left upper region of panel A in Figure 4). For each mouse, we obtained OCT images of the superior nasal (SN), superior temporal (ST), inferior nasal (IN), and inferior temporal (IT) fields of the optic nerve. We then averaged the thicknesses from these four fields. In each field, we measured and statistically evaluated the retinal thickness at distances of 1.5, 3, and 4.5 papillae diameters (PD) from the optic nerve head.

      This approach allowed us to ensure that the same locations were selected for comparison and provided a comprehensive assessment of retinal thickness across different regions. We have detailed this methodology in the revised manuscript to clarify the imaging process and the consistency of the selected locations.

      Thank you for your insightful feedback.

      Reviewer #2:

      Addressing the following concerns is necessary to improve the manuscript.

      (1) Response to recommendation (1):

      The manuscript contains many grammatical errors and should be carefully reviewed for corrections. For example: In the title, "Silicone Wire Embolization-induced Acute Retinal Artery Ischemia and Reperfusion Model in Mouse: Gene Expression Provide Insight into Pathological Processes". It should be "Provides" instead of "Provide". In the Abstract, "The resident microglia within the retina and peripheral leukocytes which access to the retina were pronounced increased on reperfusion periods." It should be "pronouncedly" or "markedly" instead of " pronounced".

      Thank you for your careful reading and pointing out the grammatical errors in the manuscript. We apologize for these mistakes and have since revised and polished the article with the assistance of native English speakers. Ensuring accurate and clear language usage in scientific writing is crucial, and we appreciate your help in improving the quality of our manuscript. Thank you for bringing these errors to our attention.

      (2) Response to recommendation (2):

      Video 2: the video content from "30s-47s" and "50s-67s" is repeatedly shown.

      Thank you for your careful review of the video. In the process of preparing the external carotid artery for silicone wire embolus insertion, we first ligated the distal end with a square knot and then tied a loose knot at the proximal end. In the video content from "30s-47s" and "50s-67s", we are tying a square knot. We apologize for any confusion caused by these repeated video clips.

      (3) Response to recommendation (3):

      Figure 1: The ConA staining (H-I) and FFA (J-K) were performed before the removal of silicone wire embolus. It would be beneficial to clarify this in the figure legend too. Additionally, the label 'Post. Sup. Alveolar art.: Posterior superior alveolar artery' is not present in Figure 1L."

      Thank you for your thorough review of the manuscript and the valuable suggestions regarding Figure 1. We have updated the figure legend of Figure 1 to clarify that ConA staining (H-I) and FFA (J-K) were performed before the removal of the silicone wire embolus (line 868 and line 873). Additionally, we have included the label 'Post. Sup. Alveolar art' in Figure 1L as you pointed out. We appreciate your careful attention to detail, and we have ensured that these omissions have been rectified in the revised version of the manuscript.

      (4) Response to recommendation (4):

      Figure 2: only representative images of RGCs at the peripheral retina were shown. It is not clear if only RGCs in the peripheral retina were quantified. Is there RGC loss in the central and middle retina in the UPOAO model as well? How many fields of RGCs were quantified for each retina?

      Thank you for your meticulous review of the manuscript. The quantification method of RGCs is described in detail as follows:

      Four radial incisions were made in the retina and flattened on a glass slide to create a "four-leaf clover" shape. Retina was photographed using a fluorescence microscope (BX63, Olympus, Japan). We captured images from three different regions of each retinal quadrant: 0.1 mm-0.5 mm (central region, field numbers: 1, 4, 7, 10), 0.9 mm-1.3 mm (middle region, field numbers: 2, 5, 8, 11), and 1.7 mm-2.1 mm (peripheral region, field numbers: 3, 6, 9, 12) from the optic nerve head, respectively, as shown in Author response image 1.

      Of these, the peripheral field changes were the most noticeable, so we used the Leica SP8 confocal microscope (20X) to capture peripheral field RGCs as a demonstration (Figure 2A, C, E, G). RGC counts of twelve fields of each retina were quantified and the average density of RGCs in twelve fields per retina was shown in Figure 2B, D, F, K. RGC counts in the central (field number: 1, 4, 7, 10), middle (field number: 2, 5, 8, 11), and peripheral (field number: 3, 6, 9, 12) visual fields were shown in Author response table 1-4.We have included this detailed methodology in the revised manuscript to clarify the quantification process and to address the presence of RGCs loss in both the central and middle retina in the UPOAO model. Thank you for pointing out the need for this clarification.

      Author response image 1.

      Schematic diagram of field selection. Scale bar=1.4 mm. Each retinal petal has three distinct visual fields (the area circled by the green line) that radiate from the optic nerve head to the periphery, in that order, the central, middle, and peripheral visual fields.

      Author response table 1.

      RGCs counts in each field of each retina (30-minute ischemia and 3-day reperfusion)

      Author response table 2.

      RGCs counts in each field of each retina (30-minute ischemia and 7-day reperfusion)

      Author response table 3.

      RGCs counts in each field of each retina (60-minute ischemia and 3-day reperfusion)

      Author response table 4.

      RGCs counts in each field of each retina (60-minute ischemia and 7-day reperfusion)

      (5) Response to recommendation (5):

      Figure 3: The representative wave lines in panels A (60min_3d, 60min_7d) and F do not reflect the statistical analysis presented in panels D, E, and G, especially for the amplitudes of b waves and OPs.

      Thank you for your careful review of the manuscript. We've added labels for a-waves, b-waves, and improved the presentation of OPs to make the details of the amplitude more visible (Figure 3). In the previous version, due to incorrect settings, we did not adjust the ordinate spacing when fitting curves of representative wave lines in four groups, resulting in the curves being compressed vertically to the same height. We have now adjusted the curves to be fitted under the same scale bar (shown in the bottom right corner of Figure. 3A). What’s else, we removed the baseline wave of the OPs wave and adjusted the abscissa scale to highlight the N waves and P waves for easy reading (Figure 3F).

      (6) Response to recommendation (6):

      There are two different Supplementary Figure 1 and no Supplementary Figure 3, resulting in misaligned references to Supplementary Figures 1, 2, and 3 in the text.

      Thank you for your careful review of the manuscript. We have reviewed the manuscript again and identified errors in uploading the supplementary figures, which resulted in duplicate Supplementary Figure 1 and the absence of Supplementary Figure 3. We have corrected these issues and realigned the references to Supplementary Figures 1, 2, and 3 in the text to ensure consistency. We appreciate your attention to detail and your reminder to address this issue.

      (7) Response to recommendation (7):

      There is confusion about the definition of ORL (outer retina layer). In Lines 208-209, ORL was defined as the combined thickness of the rest to the retinal pigment epithelium (RPE). It seems the ONL is included in ORL. But in lines 358-359, 907-908, "the ORL encompassed the region from the inner segment/outer segment (IS/OS) to the RPE". Please make the definition consistent. In addition, it is hard to distinguish the regions marked by the green lines in Fig. 4A (sham image) after Line 902.

      Thank you for your careful review of the manuscript. We have addressed the confusion regarding the definition of the outer retinal layer (ORL). The Heidelberg OCT device does not distinguish the layers of the mouse retina well, so we divided it into three broader layers:

      (1) Ganglion Cell Complex (GCC) layer, which encompasses RNFL+GCL+IPL.

      (2) Middle Retinal Layer, which includes INL+OPL.

      (3) Outer Retinal Layer (ORL), which includes ONL+IS/OS+RPE.

      We apologize for the inconsistency and have revised the errors in the manuscript and figure legends accordingly. Additionally, we have removed rare domain-specific acronyms and replaced them with more commonly understood abbreviations, as suggested, to avoid confusion.

      Furthermore, we have enlarged parts of the OCT images to better display the layers, hoping to meet the readers' requirements and improve clarity. Thank you for your valuable feedback.

      (8) Response to recommendation (8):

      Figure 4 (Panels H-J, L-M) incorporated with the text (Line 902) differs from the high-resolution version of Figure 4 included later in the manuscript. In Figure 4 (Panels H-J, L-M) merged with the text (Line 902), the quantification of the IPL and INL thickness is incorrect, and the scale bar is inaccurate. However in the high-resolution version of Figure 4 provided later, the thickness of the RNFL+GCL is incorrect.

      Thank you for your careful review of the manuscript. The quantification of the IPL and INL thickness in Figure 4 (Panels H-J, L-M) incorporated with the text has been revised to ensure accurate measurements and scale bars (Figure 4 and line 924). The high-resolution version of Figure 4 provided later has been updated to correct the thickness measurements of the RNFL+GCL. We have ensured that the ordinate in the high-resolution version of Figure 4 now correctly represents length units, consistent with the equal proportional conversion used in the integrated text figures.

      Thank you for your valuable feedback and for pointing out these errors. We have made the necessary corrections to align the figures accurately with the manuscript.

      (9) Response to recommendation (9):

      Line 384-386: the statement "Notably, a-waves in ERG and the thickness of the outer retinal layers in both OCT and HE remained unchanged." is not accurate, since a-waves in ERG is not changed in 3 days but changed in 7 days, and the thickness of the outer retinal layers in HE is either not measured or not shown in Figure 4.

      Thank you for your careful review of the manuscript. We apologize for this error and have revised it.

      We aimed to convey that the amplitude of the a-waves, which represent the function of the photoreceptors, does not show significant variation, which is consistent with the thickness of the outer retinal layer observed in OCT and HE images. Our results indicated that at 7 days post-injury, the amplitude of the a-waves in ERG was statistically different only at stimulus light intensity of 0.3, 3.0 and 10.0 cd.s/m2. In contrast, the b-wave amplitude was reduced by half compared to sham eyes at almost all stimulus light intensities. At the same time, the immunofluorescence staining results of photoreceptor cells showed no significant change at 7-days. Therefore, we consider the change in a-wave amplitudes were not significant compared to the significant decrease in b-wave amplitude. We have clarified this in the revised manuscript.

      We also analyzed the thickness of the outer retinal layers in HE and found it to be consistent with OCT results, showing no significant changes (shown in below Author response image 2).

      Thank you for your valuable feedback, which has helped improve the accuracy and clarity of our manuscript.

      Author response image 2.

      Thickness of OPL, ONL, IS/OS+RPE in HE staining. n=3; ns: no significance (p>0.05).

      (10) Response to recommendation (10):

      Figure 5 and Figure S3: Quantification data from different sections of the same retina should be averaged to represent one single sample (one data point) for statistical analysis. * in images of Fig. 5E, F, I, J is not defined in the figure legend. It would be easier for readers to follow if the GCL, IPL, INL, and OPL were labeled in retinal sections.

      Thank you for your careful review of the manuscript and recommendation. We have reperformed the statistical analysis and updated the results in Figure 5 and Figure S3. In the UPOAO experimental eyes, no no significant change in the number of HCs (Calbindin) was observed during the 3-days reperfusion period, while a notable reduction was observed after 7 days (Figure 5). Additionally, we have added the definition of the asterisks (*) in the figure legend to clarify their significance. We have also labeled the retinal layers, including the GCL, IPL, INL, OPL, and ONL, in the images to make it easier for readers to follow and understand the data.

      Thank you for helping us improve the clarity and accuracy of our manuscript.

      (11) Response to recommendation (11):

      Lines 407-409, the statement "which aligns with the a-waves observed in ERG (Figure 3D, E) and the changes seen in the outer retinal layers in OCT (Fig S2C, D)" is confusing. No changes were observed by OCT in Fig S2D.

      Thank you for your review and we are sorry about the confusion. The overall trend of the amplitude of the a-wave in ERG at 7-days did not change significantly, which is consistent with the immunofluorescence staining results of the photoreceptor cells. Based on these observations, we consider that the change in the amplitude of the a-wave was not significant. As you pointed out in recommendation 9,since a-waves in ERG were changed in 7-days at the stimulus light intensity of 0.3, 3.0 and 10.0 cd.s/m2, our description on the a-waves in 7-days was not accurate. We have clarified this point in the revised manuscript to ensure it accurately reflects the data presented.

      (12) Response to recommendation (12):

      In Figure S4, panel C shows lymphocyte-mediated immunity, and panel D shows leukocyte-mediated immunity. Please adjust the figure legend accordingly to reflect the figures.

      Thank you for your careful review of the manuscript. We have modified the figure legend of Figure S4.

      (13) Response to recommendation (13):

      Lines 440-442 state "These results suggested early ischemic processions such as cell migration and potential collateral vessel formation." It is not clear why and how "potential collateral vessel formation" is suggested by Figure 6 and Figure S4. Please clarify this in the text.

      Thank you for your careful review of the manuscript and we have deleted this sentence due to insufficient evidence. We have corrected this sentence: "These results suggested that in the early stage of retinal ischemic injury, leukocytes from the microvasculature may infiltrate retinal tissue. More experimental validation will be performed to confirm this hypothesis."(line 448). We will be more cautious in drawing conclusions in the future. Thank you for your reminder.

      (14) Response to recommendation (14):

      For the figure legend of Figure 6 "In each heatmap, upper box showed the top 10 up-regulated genes, and the below one showed the top 10 down-regulated genes." Is this correct? It appears that the upper box shows the top 10 down-regulated genes, and the lower box shows the top 10 up-regulated genes.

      Thank you for your careful review of the manuscript and we have modified the figure legend of Figure 6. In the heatmaps, the upper box showed the top 10 down-regulated genes, and the below one showed the top 10 up-regulated genes (line 977).

      (15) Response to recommendation (15):

      For the figure legend of Figure 7, the statement 'Data points are from retinal sections of four animals' is incorrect, as these data were obtained from whole retinas instead of retinal sections. Please revise the legend to reflect this accurately. The scale bar was absent in the images of Figure 7. Asterisk in Figure 7H and 7I was not defined.

      Thank you for your careful review of the manuscript and we have revised the errors. We have added the scale bar (Figure 7D). The white asterisks in Figure 7H and 7I indicate the activated microglial cells and we have added this definition in the legend of Figure7 (line 981).

      (16) Response to recommendation (16):

      It would be better to switch the order of Figure S7 and Figure S8 to align with their descriptions in the text.

      Thank you for your recommendation and we have switched the order of Figure S7 and Figure S8.

      (17) Response to recommendation (17):

      The gene names in Figure S8 should be written consistently with those listed in Table S1.

      Thank you for your recommendation and we have corrected the gene names.

      (18) Response to recommendation (18):

      In Figure 9, it is not clear why amacrine cells were not included in the UPOAO model, as amacrine cells were also injured as shown in Figure 5I-L.

      Thank you for your careful review of the manuscript and we have added amacrine cells in Figure 9.

      References

      (1) Yang, H., et al., The connective tissue phenotype of glaucomatous cupping in the monkey eye - Clinical and research implications. Prog Retin Eye Res, 2017. 59: p. 1-52.

      (2) Pavlatos, E., et al., Regional Deformation of the Optic Nerve Head and Peripapillary Sclera During IOP Elevation. Invest Ophthalmol Vis Sci, 2018. 59(8): p. 3779-3788.

      (3) Lee, D., et al., A mouse model of retinal hypoperfusion injury induced by unilateral common carotid artery occlusion. Experimental Eye Research, 2020. 201: p. 108275.

      (4) Barthels, D. and H. Das, Current advances in ischemic stroke research and therapies. Biochim Biophys Acta Mol Basis Dis, 2020. 1866(4): p. 165260.

      (5) Smith, H.K., et al., Critical differences between two classical surgical approaches for middle cerebral artery occlusion-induced stroke in mice. J Neurosci Methods, 2015. 249: p. 99-105.

      (6) Janáky, M., et al., Hypobaric hypoxia reduces the amplitude of oscillatory potentials in the human ERG. Doc Ophthalmol, 2007. 114(1): p. 45-51.

      (7) Tinjust, D., H. Kergoat, and J.V. Lovasik, Neuroretinal function during mild systemic hypoxia. Aviat Space Environ Med, 2002. 73(12): p. 1189-94.

      (8) Lee, D., et al., Retinal Degeneration in a Murine Model of Retinal Ischemia by Unilateral Common Carotid Artery Occlusion. Biomed Res Int, 2021. 2021: p. 7727648.

      (9) Yamamoto, H., et al., Complex neurodegeneration in retina following moderate ischemia induced by bilateral common carotid artery occlusion in Wistar rats. Exp Eye Res, 2006. 82(5): p. 767-79.

      (10) Palmhof, M., et al., From Ganglion Cell to Photoreceptor Layer: Timeline of Deterioration in a Rat Ischemia/Reperfusion Model. Front Cell Neurosci, 2019. 13: p. 174.

      (11) Adachi, M., et al., High intraocular pressure-induced ischemia and reperfusion injury in the optic nerve and retina in rats. Graefes Arch Clin Exp Ophthalmol, 1996. 234(7): p. 445-51.

      (12) Jehle, T., et al., Quantification of ischemic damage in the rat retina: a comparative study using evoked potentials, electroretinography, and histology. Invest Ophthalmol Vis Sci, 2008. 49(3): p. 1056-64.

      (13) Hayreh, S.S., H.E. Kolder, and T.A. Weingeist, Central retinal artery occlusion and retinal tolerance time. Ophthalmology, 1980. 87(1): p. 75-8.

      (14) Luo, X., et al., Hypoglycemia induces general neuronal death, whereas hypoxia and glutamate transport blockade lead to selective retinal ganglion cell death in vitro. Invest Ophthalmol Vis Sci, 2001. 42(11): p. 2695-705.

      (15) Schmid, H., et al., Loss of inner retinal neurons after retinal ischemia in rats. Invest Ophthalmol Vis Sci, 2014. 55(4): p. 2777-87.

      (16) Furashova, O. and E. Matthè, Hyperreflectivity of Inner Retinal Layers as a Quantitative Parameter of Ischemic Damage in Acute Retinal Vein Occlusion (RVO): An Optical Coherence Tomography Study. Clin Ophthalmol, 2020. 14: p. 2453-2462.

      (17) Pang, Y., et al., CD38 Deficiency Protects Mouse Retinal Ganglion Cells Through Activating the NAD+/Sirt1 Pathway in Ischemia-Reperfusion and Optic Nerve Crush Models. Invest Ophthalmol Vis Sci, 2024. 65(5): p. 36.

      (18) Feng, Y., et al., GSK840 Alleviates Retinal Neuronal Injury by Inhibiting RIPK3/MLKL-Mediated RGC Necroptosis After Ischemia/Reperfusion. Invest Ophthalmol Vis Sci, 2023. 64(14): p. 42.

      (19) Zeng, S., et al., CREG Protects Retinal Ganglion Cells loss and Retinal Function Impairment Against ischemia-reperfusion Injury in mice via Akt Signaling Pathway. Mol Neurobiol, 2023. 60(10): p. 6018-6028.

      (20) Rosenbaum, D.M., et al., The role of the p53 protein in the selective vulnerability of the inner retina to transient ischemia. Invest Ophthalmol Vis Sci, 1998. 39(11): p. 2132-9.

      (21) Zhang, Y., et al., Melatonin Alleviates Pyroptosis of Retinal Neurons Following Acute Intraocular Hypertension. CNS Neurol Disord Drug Targets, 2021. 20(3): p. 285-297.

      (22) Zhu, J., et al., Protective effects of Erigeron breviscapus Hand.- Mazz. (EBHM) extract in retinal neurodegeneration models. Mol Vis, 2018. 24: p. 315-325.

      (23) Wachtmeister, L., Oscillatory potentials in the retina: what do they reveal. Prog Retin Eye Res, 1998. 17(4): p. 485-521.

      (24) Cao, W., et al., Dextromethorphan attenuates the effects of ischemia on rabbit electroretinographic oscillatory potentials. Documenta Ophthalmologica, 1993. 84(3): p. 247-256.

      (25) Xu, J., et al., Pregabalin Mediates Retinal Ganglion Cell Survival From Retinal Ischemia/Reperfusion Injury Via the Akt/GSK3β/β-Catenin Signaling Pathway. Invest Ophthalmol Vis Sci, 2022. 63(12): p. 7.

      (26)Takács, B., et al., Electroretinographical Analysis of the Effect of BGP-15 in Eyedrops for Compensating Global Ischemia-Reperfusion in the Eyes of Sprague Dawley Rats. Biomedicines, 2024. 12(3).

      (27) Porciatti, V., Electrophysiological assessment of retinal ganglion cell function. Exp Eye Res, 2015. 141: p. 164-70.

      (28) Ridder, W.H. and S. Nusinowitz, The visual evoked potential in the mouse—Origins and response characteristics. Vision Research, 2006. 46(6): p. 902-913.

      (29) Liu, S., et al., An optimized procedure to record visual evoked potential in mice. Exp Eye Res, 2022. 218: p. 109011.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Satoshi Yamashita et al., investigate the physical mechanisms driving tissue bending using the cellular Potts Model, starting from a planar cellular monolayer. They argue that apical length-independent tension control alone cannot explain bending phenomena in the cellular Potts Model, contrasting with the vertex model. However, the evidence supporting this claim is incomplete. They conclude that an apical elastic term, with zero rest value (due to endocytosis/exocytosis), is necessary in constricting cells and that tissue bending can be enhanced by adding a supracellular myosin cable. Notably, a very high apical elastic constant promotes planar tissue configurations, opposing bending.

      Strengths:

      - The finding of the required mechanisms for tissue bending in the cellular Potts Model provides a more natural alternative for studying bending processes in situations with highly curved cells.

      - Despite viewing cellular delamination as an undesired outcome in this particular manuscript, the model's capability to naturally allow T1 events might prove useful for studying cell mechanics during out-of-plane extrusion.

      We thank the reviewer for the careful comments and insightful suggestions.

      Weaknesses:

      - The authors claim that the cellular Potts Model is unable to obtain the vertex model simulation results, but the lack of a substantial comparison undermines this assertion. No references are provided with vertex model simulations, employing similar setups and rules, and explaining tissue bending solely through an increase in a length-independent apical tension.

      Studies cited in a previous paragraph included the simulations employing the increased length-independent apical tension. For the sake of clarity, we added the citation to them as below.

      P4L174: “In contrast to the simulations in the preceding studies (Sherrard et al., 2010; Conte et al., 2012; Perez-Mockus et al., 2017; Pérez-González et al., 2021), our simulations could not reproduce the apical constriction”.

      We did not copy the parameters of the vertex models in the preceding studies because we also found that the apical, lateral, and basal surface tensions must be balanced otherwise the epithelial cell could not maintain the integrity (Figure 1—figure supplement 1), while the ratio was outside of the suitable range in the preceding studies.

      - The apparent disparity between the two models is attributed to straight versus curved cellular junctions, with cells with a curved lateral junction achieving lower minimum energies at steady-state. However, a critical discussion on the impact of T1 events, allowing cellular delamination, is absent. Note that some of the cited vertex model works do not allow T1 events while allowing curvature.

      We appreciate the comment and added it to the discussion as suggested.

      P12L301: “Even when the vertex model allowed the curved lateral surface, the model did not assume the cells to be rearranged and change neighbors, limiting the cell delamination (Pérez-González et al., 2021).”

      P12L311: “Note that the vertex model could also be extended to incorporate the curved edges and rearrangement of the cells by specifically programming them, and would reproduce the cell delamination. That is, we could find the importance of the balanced pressure because the cellular Potts model intrinscally included a high degree of freedom for the cell shape, the cell rearrangement, and the fluctuation.”

      - The suggested mechanism for inducing tissue bending in the cellular Potts Model, involving an apical elastic term, has been utilized in earlier studies, including a cited vertex model paper (Polyakov 2014). Consequently, the physical concept behind this implementation is not novel and warrants discussion.

      The reviewer is correct but Polyakov et al. assumed “that the cytoskeletal components lining the inside membrane surfaces of the cells provide these surfaces with springlike elastic properties” without justification. We assumed that the myosin activity generated not the elasticity but the contractility based on Labouesse et al. (2015), and expected that the surface elasticity corresponded with the membrane elasticity. Also, in the physical concept, we clarified how the contractility and the elasticity differently deformed the cells and tissue, and demonstrated why the elasticity was important for the apical constriction. We added it to the discussion as below.

      P12L316: “In the preceding studies, the apically localized myosin was assumed to generate either the contractile force (Sherrard et al., 2010; Conte et al., 2012; Perez-Mockus et al., 2017; Pérez-Vonzález et al., 2021) or the elastic force (Polyakov et al., 2014; Inoue et al., 2016; Nematbakhsh et al., 2020). However, the limited cell shape in the vertex model made them similar in terms of the energy change during the apical constriction, i.e., the effective force to decrease the apical surface. In this study, we showed that the contractile force and the elastic force differently deformed the cells and tissue, and demonstrated why and how the elasticity was important for the apical constriction.”

      - The absence of information on parameter values, initial condition creation, and boundary conditions in the manuscript hinders reproducibility. Additionally, the explanation for the chosen values and their unit conversion is lacking.

      We agree with the comment.

      For the initial configuration, we added an explanation to Tissue deformation by increased apical contractility with cellular Potts model section in the Results as below.

      P4L170: “A simulation started from a flat monolayer of cells beneath the apical ECM, and was continued until resulting deformation of cells and tissue could be evaluated for success of failure of reproducing the apical constriction.”

      For the parameter values we added a section “Parameters for the simulations” in the Methods.

      For the parameters unit conversion, we did not measure the surface tension and cell pressure in an actual tissue and thus could not compare the parameters to the actual forces. Instead, we varied the parameters and demonstrated that the apical constriction was reproduced with the wide range of the parameter values. We added it to the discussion as below.

      P12L310: “It succeeded with a wide range of parameter values, indicating a robustness of the model.”

      Reviewer #2 (Public Review):

      Summary:

      In their work, the authors study local mechanics in an invaginating epithelial tissue. The mostly computational work relies on the Cellular Potts model. The main result shows that an increased apical "contractility" is not sufficient to properly drive apical constriction and subsequent tissue invagination. The authors propose an alternative model, where they consider an alternative driver, namely the "apical surface elasticity".

      Strengths:

      It is surprising that despite the fact that apical constriction and tissue invagination are probably most studied processes in tissue morphogenesis, the underlying physical mechanisms are still not entirely understood. This work supports this notion by showing that simply increasing apical tension is perhaps not sufficient to locally constrict and invaginate a tissue.

      We thank the reviewer for recognizing the importance and novelty of our work.

      Weaknesses:

      The findings and claims in the manuscript are only partially supported. With the computational methodology for studying tissue mechanics being so well developed in the field, the authors could probably have done a more thorough job of supporting the main findings of their work.

      We thank the reviewer for the careful assessment and suggestions. However our simulation was computationally expensive, modeling the epithelium in an analytically calculable expression requires a lot of work, and it is beyond the scope of the present study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Reference line 648: Correct the author's name (Pérez-González).

      We thank the reviewer and corrected the reference.

      (2) "Pale" colors are challenging to discern.

      We updated the figures.

      (3) Figure 1j: What does the yellow color in the cellular junction represent?

      We used the apical lateral site colored yellow in Fig. 1e-f’ to simulate the effect of the adherens junction. We updated the figure legend.

      (4) Figure 2c - left: Why is there a red apical junction?

      Our simulation model marked the apical junction in the initial configuration and updated the marking based on connectedness to surrounding other site marked as apical in the same cell. But when a cell was once delaminated and lost its apical junction, any surface site not adjacent to other epithelial cells were marked as basal junction because they were not adjacent to the apical junction.

      We added it to Cellular Potts model with partial surface elasticity section in the Methods as below.

      P17L430: “To simulate the differential phyisical properties of the apical, lateral, and basal surfaces, the subcellular locations are marked automatically, and the marking is updated during the simulation. In each cell, sites adjacent to different cells but not to the medium are marked as lateral.

      At the initial configuration, sites adjacent to the apical ECM are marked as apical, and during the simulation, sites adjacent to medium and other apical sites in the same cell are marked as apical.

      Rest of sites which are adjacent to medium but not marked as apical are marked as basal.

      Therefore, once a cell is delaminated and loses its apical surface, afterwards all sites in the cell adjacent to the medium are marked as basal even if it is adjacent to the apical ECM or the outer body fluid.”

      (5) Figure 4a: The snapshots are not in a steady state but in the middle of deformation. Is the time the same for all snapshots? The motivation to change P_0a is related to endocytosis. However, this could be achieved by decreasing P_0a to a non-zero value. Here, in the more drastic limit, the depth (a measure of bending) is very slight, approximately half of a cell size. What physically limits further invagination? Is it the number of cells or the range of parameters under study?

      The time length was the same for simulations in each figure, and we add it to Parameters for the simulations section in Method as below.

      P18L466: “In each figure, snapshots of the simulations show deformation by the same time length unless specified.”

      For P_0a, the reviewer is correct and the iterated ratcheting may decrease P_0a step by step instead of making it 0 immediately. Still, with P_a0 >0, the energy function and its derivative are both increasing with respect to the apical width as long as P_a > P_a0, and thus the apical shrinkage would be synchronized, even though the deformation would be smaller. We also run simulations by decreasing P_0a to 0.6 times the initial P_a, and observed smaller deformation as expected. On the other hand, the non-zero P_0a made the invagination deeper when it was combined with the effect of surrounding supracellular myosin cable, maybe due to a resistance of the apical surface against compression. One of the novel and important finding in this study is the synergetic effect of the elasticity-based apical constriction and the surrounding supracellular myosin cable. To demonstrate that the deep invagination was not due to the apical surface resistance against the compression, we showed the simulations with P_a0 = 0.

      For the conditions for further invagination, it may include the number of cells, a ratio between the cell height and width (Figure 5—figure supplement 1), interaction with ECM (Figure 5—figure supplement 2), etc. For the parameter, there might be an upper limit (Figure 4). We did not test the number of cells because of its computational cost. Among the conditions we tested, we found the planar compression by surrounding supracellular myosin the most influential rather than the mechanical property of apically constricting cells themselves.

      How each condition and parameter contributes to the invagination shall be studied in future. We added it to the conclusion as below.

      P15L395: “The depth, curvature, and speed of the invagination might be influenced by the cell shape, configuration, and parameters, and how each condition contributes to the invagination shall be studied in future.”

      (6) Figure 6b: What does the cell-surface color represent? If the idea was to represent junction tension, it would be clearer to color the junctions only.

      The junction tension may vary differently in different situations. For example, T1 transition is accompanied by enriched myosin along a shrinking cell-cell junction, and the junction bears higher tension, but other junctions of the same cell do not and thus the cell does not decrease its apical surface. In chick embryo neural tube closure, the junction tension is also polarized, and the cells shrink the apical surface along medial-lateral axis, driving the apical constriction (Nishimura et al., 2012, doi:10.1016/j.cell.2012.04.021). In the case of Drosophila embryo tracheal invagination, the cells shrank their apical surface isotropically (Figure 6a). If the junction tension was responsible for the shrinkage, all junctions of the cell must bear higher tension. Based on this assumption, the junction tension was averaged in each cell to check if the tracheal cells bore the higher average tension than surrounding cells.

      We also plotted stress tensor and calculated nematic order to check if there was radial or encircling tension alignment in the tracheal pit, but there was not.

      (7) Figure 6c: What does the junction color represent here?

      The junction color represent the relative junctional tension. We updated the figure legend.

      (8) Figure 6d-e: It is challenging to understand which error bar corresponds to each dataset.

      We updated the figure.

      (9) What is the definition of relative pressure?

      The geometrical tension inference method assumes that the tissue is in mechanical equilibrium and a sum of the junctional tensions and cell pressures pulling/pushing a vertex (tricellular junction) is 0. Therefore the calculated tensions and pressures are proportional to each other but not absolute values. We added it to the 3D Bayesian tension inference section of Methods as below.

      P24L567: “Since Equation 13 and Equation 14 only evaluate the balance among the forces, it cannot estimate an absolute value but a relative value of the tension and pressure.”

      (10) In the main text, it is mentioned that a large Es (apical elastic constant) leads to flat surfaces, avoiding bending, but the abstract says "strong apical surface tension," which, according to the rest of the text, would seem to be J_apical. Clarification is needed.

      The surface tension includes both of the surface contractility and the surface elasticity.

      We added it to Extended cellular Potts model to simulate epithelial deformations section in the Results as below.

      P3L122: “Note that in some studies the tension and the contractility are considered as equivalent, but they are distinguished in this study.”

      and

      P4L151: “The energy H included only the terms of the contact energy (Equation 1) and the area constraint (Equation 5), but the surface elasticity (Equation 2) nor (Equation 3) was not included, and thus the surface tension was determined by the contact energy.”

      Reviewer #2 (Recommendations For The Authors):

      (1) The model used is rather specific and it is rather confusing whether the issue is in the methodology or fundamental biophysics of apical constriction. For instance, one of the main narratives of the manuscript is that the Cellular Potts model better predicts apical constriction and tissue invagination than the vertex model. As I understand it, and as the authors state in p7 (line 210), "the difference between the vertex model and the cellular Potts model results was due to the straight lateral surface...". I assume that if apical constriction and tissue invagination were modelled with a vertex model with curved edges, while also allowing for cell rearrangements out of the tissue plane (some sort of epithelium-to-mesenchyme transition), the vertex model would yield exactly the same results as in the authors' cellular Potts model. If my understanding is correct, the authors should change the narrative of their manuscript and focus more on the comparison of a model with flat vs. curved edges, with "contractility" vs. "surface elasticity", with patterned apical contractility vs. non-patterned contractility (see my comment in point 2 below)... and not on comparison between CPM and VM.

      We appreciate the comments. The reviewers is correct that the vertex model can include the curved edges and the cell rearrangement, and it would reproduce the result of our cellular Potts model simulations. For the cellular Potts model, there was no need to specifically design how much the cell surface could be curved in a large arc, zigzag, or other shape, and that enabled us to find the conditions of delamination and bending.

      We added it to the discussion as below.

      P12L311: “Note that the vertex model could also be extended to incorporate the curved edges and rearrangement of the cells by specifically programming them, and would reproduce the cell delamination. That is, we could find the importance of the balanced pressure because the cellular Pott’s model intrinscally included a high degree of freedom for the cell shape, the cell rearrangement, and the fluctuation.”

      (2) About physics... and I think this is a really important point: one of the observations in the model was that in the "contractilty" model, only "edge cells" shrank its apical surface, while inner cells remained quadrilateral. Related to this, the authors say that one of the requirements for proper apical constriction is a mechanism that "simulataneously shrinks the apical surface among cells in a cluster". What would happen if the authors assumed patterned contractility, meaning that cells in the center of the cluster would be most apically-contractile, while those further away from the center, would not be contractile? Features like this were investigated in studies of ventral-furrow invagination [see, for instance, Spahn and Reuater PLOS ONE (2013) and Rauzi et al. Nat Commun (2015)-Fig. S13d].

      We thank the reviewer for the critical comment, and ran simulations with the patterned apical contractility. The apical contractility following a gradient of parabola shape succeeded in the simultaneous apical shrinkage. However, it was weak against fluctuations and the cells were delaminated by chance.

      We added it to Apical constriction by modified apical elasticity section in the result as below.

      P9L252: “We also tested another model for the simultaneous apical shrinkage, a gradient contractility model (Spahn and Reuter, 2013; Rauzi et al., 2015). If the inner cells bear higher apical surface contractility than the edge cells, that inner cells may shrink their apical surface. To synchronize the apical shrinkage, the apical contractility must follow a parabola shape gradient. Even though the gradient contractility enabled the cells to shrink the apical surface simultaneously, often some of the cells shrank faster than neighbors and were delaminated by chance (Figure 4—figure Supplement 1).”

      (3) The quality of the figures should be improved. Especially, Figure 3 and the related explanation in lines 183-192. This explanation is way too complicated and it is not clear what Figure 3c shows. For instance: if the arrows are indeed showing contractile forces (as written in the caption) then they are not illustrated correctly, but should be tangential to the cell membrane.

      We updated the figure.

      (4) The figures mostly show steady-state cross-sections from simulations. I miss a more dedicated study with model parameters being varied through wider ranges and some phase diagrams being shown etc. Also, some results could probably be supported by analytic calculations. For instance, the condition for stability (discussed in p4 lines 145-151), cells' preferred aspect ratio, cells' preferred "wedgeness" i.e., local curvature etc... I am sure some of these, if not all, could be calculated analytically and then these analytic results could help to interpret the phase diagrams.

      For the simulation results shown in the figures, we were not sure if the simulations results were in a steady state or not. We added it to Tissue deformation by increased apical contractility simulated with cellular Potts model section in the Results as below.

      P4L170: “A simulation started from a flat monolayer of cells beneath the apical ECM, and was continued until resulting deformation of cells and tissue could be evaluated for success of failure of reproducing the apical constriction.”

      For the ranges of parameters, we ran the simulation in wider range and showed results from sub-range. We added it to Parameters for the simulations section in Methods as below.

      P18L464: “The parameters were varied in a range, and the figures showed simulations with parameter values within a sub-range so that the results showed both success and failure in a development of interest.”

      For the analytical calculations, the Figure 3f shows a kind of phase diagram for shapes of a single cell. To clarify this, we rephrased “map of cell shapes” to “Phase diagram of cell shapes” in the figure legend, and added an explanation to the Results section as below.

      P6L207: “For the analysis of the cell shape in motion, we plotted a phase diagram for shapes of a single cell (Figure 3f).”

      For the analytical evaluation of the cellular Potts model simulations, there was a study doing similar but it concerned a cell of isotropic shape in a steady state (Magno et al., 2015, doi:10.1186/s13628-015-0022-x). Also, our simulation framework is computationally expensive and we could not vary the parameters in fine resolution. Therefore we could not include it in this study.

      (5) I am not sure about the terminology "contractility" vs. "elasticity". In Farhadifar et al. (2007) "contractility" is described by a squared apical-perimeter energy term, while in this work, the authors describe it by a surface-energy-like term.

      In general, elasticity is the ability of a material to resist against deformation and to return to its original shape/size. In Farhadifar et al. (2007), the cell apical area was assigned the area elasticity in this meaning. For the contractility, it is the ability to decrease the size/length, and thus it could be either expressed in linear or quadratic dependent on the modeling. In this study, we assumed cell-cell/cell-ECM adhesion and myosin activity to generate the surface contractility, and thus employed the linear expression. In Farhadifar et al. (2007) it was described as a line tension.

      We used the terms surface ‘elasticity’ and ‘contractility’ as distinctive elements composing the surface ‘tension’. We added it Extended cellular Potts model to simulate epithelial deformations section in the Results as below.

      P3L122: “Note that in some studies the tension and the contractility are considered as equivalent, but they are distinguished in this study.”

      (6) It is not entirely clear what are apical, basal, lateral, and cell "perimeters". This is a 2D model, so I assume all P-s are in fact interface lengths. In either case, this needs to be explained more clearly.

      We updated the explanation in Extended cellular Potts model to simulate epithelial deformations section in the Results as below.

      P3L111: “The cell's perimeter was partitioned automatically based on adjacency with other cells, and it was marked as apical, lateral, basal. Also, apico-lateral sites were marked as a location for the adherens junction. This cell representation also cast the vertical section of the cell. Therefore an area of the cell corresponded with a body of the cell, and a perimeter of the cell corresponded with the cell surface. Likewise the apical, lateral, and basal parts of the perimeter corresponded with the apical surface, cell-cell interface, and the basal surface of the cell respectively.”

      (7) The term H_{mc} is not clear at all. Why is this term called potential energy? What is U(i)? What is the exact biophysical interpretation of this term in 2D vs 3D?

      In 3D, the supracellular myosin cable is formed encircling the cells deformed by the apical constriction. Shrinking of the supracellular myosin cable makes the circle small, and it moves the cable toward the center of the circle. To simulate this motion of the supracellular myosin cable in the 2D cross section, we assigned the force exerted on the adherens junction of the boundary cells pulling toward the center, and because the force is relative to the position of the adherens junction and the center, it was expressed by the potential energy in the simulation.

      We updated Extended cellular Potts model to simulate epithelial deformation section in Results and Cellular Potts model with potential energy section in Methods as below.

      P4L140: “The potential energy was defined by a scalar field which made a horizontal gradient decreasing toward the center,”

      and

      P17L449: “In 3D, tension on a circular actomyosin cable would shrink the circle, and the shrinkage would pull the cable toward the center of the circle. In 2D cross section, the cable is pulled horizontally toward the middle line.”

      (8) Highten->increased

      We updated the text.

      (9) "It seems natural to consider that the myosin generates a force proportional to its density but not to the surface width nor the strain". This sentence should be supported by a reference. Also, if the force is proportional to myosin density, then it must depend on surface width, since density, I assume, is the number of motors per area.

      For the myosin density and generated force, in all preceding studies cited in this manuscript and others in the extent of our knowledge, the myosin and actin filaments density visualized by staining or labeling had been assumed relevant to the generated contractility without references. Therefore it might be well established and shared assumption.

      For the independence from the surface width and strain, the review comment is correct, but the results would be the same. If we presumed that the number of motors on the apical surface was constant in a cell during the apical constriction, then the density would increase when the apical surface was contracted, and thus it would make the apical contractility more unbalanced and promote the delamination. We added it to the results and discussion as below.

      P4L166: “For the sake of simplicity, we ignored an effect of the constriction on the apical myosin density, and discussed it later.”

      P14L328: “In our model, for the sake of simplicity, we ignored an effect of the constriction on the apical myosin density. If we presumed that the apical myosin would be condensed by the shrinkage of the apical surface, it would increase the apical tension in the shrinking cell and is expected to promote the cell delamination further. Therefore it would not change the results.”

      Reviewing Editor (Recommendations For The Authors):

      Please note also the following excerpts from discussions amongst the reviewers and the Reviewing Editor:

      Regarding Reviewer #2's Point 2:

      I believe the authors have assumed patterned contractility in their simulations, and this is shown by the "pale blue" cell color (see also lines 162-163). However, as Reviewer #2 points out in their point 2), the pale colors are very hard to see and therefore easy to miss.

      We updated figure coloring and also add the gradient pattern of contractility.

      Regarding Reviewer #2's point 5:

      It is indeed unconventional to call the "J" terms contractility, they are usually called contact energy or adhesive energy.

      In this study, we included both of the contact energy of cell-cell/cell-ECM adhesion and actomyosin activity in the surface contractility, and used the “J” term as it was conventional in the cellular Potts model.

      On the other hand, due to the parameters chosen for J_apical and J_basal in the pale blue cells, the apical membrane area will tend to shrink and the basal membrane will tend to enlarge. Because the lateral membrane energy J_lateral is constant among all cells (I think?), this will effectively drive cells to apically contract in the center.

      That expectation was an initial motivation of our study, but we found that the differential J alone could not drive the cells to apically contract in the center.

      I agree that extra clarification by the authors would be very helpful here.

      Reviewer #2:

      Regarding the patterned contractility: indeed, I missed this point (the pale blue region is really poorly visible).

      Nevertheless, it seems that contractility in the authors' model changes in a step-like fashion.

      [...] There may be important differences between furrowing under step-like patterning profile versus smooth "bell-like" patterning (see Supplementary Figure 13 in Rauzi et al. Nat Commun 2015). In particular, in the case of a step-like patterning, [there are] constrictions of side cells (similar to what the authors in this manuscript report), whereas in the bell-like patterning, [...] such side constrictions [do not occur].

      As replied to the reviewer #2 comment (2), we added the simulations with gradient-pattern contractility.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We extend our sincere gratitude to the reviewers for their constructive feedback and valuable suggestions, which have significantly contributed to enhancing the quality of our work. In response to the comments, we have meticulously revised our manuscript with the following updates:

      (1) New Data Inclusion: We have incorporated new immunofluorescent staining images, FACS analysis of monocytes, and single-cell RNA sequencing (scRNAseq) expression analysis focusing on genes related to IFNGR, as well as T cell memory subsets (Trm, Tcm, and Tem).

      (2) Comparative Analysis: We have conducted a comparative analysis between the active vitiligo dFBs and the ACD pAd (r5) identified in our study, which provides further insight into the immune response mechanisms.

      (3) Discussion Expansion: We have expanded the discussion to include the role of tissue-resident memory (Trm) T cells in our model and have addressed the limitations of our animal model and in vitro studies.

      (4) Supplemental Material: As requested by the reviewers, we have provided four new supplemental tables (Table S2 ~ S5) and specific information for antibodies used in our study.

      Please see our Point-to-Point Responses to Reviewers' comments below:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Liu et al. used scRNA-seq to characterize cell type-specific responses during allergic contact dermatitis (ACD) in a mouse model, specifically the hapten-induced DNFB model. Using the scRNA-seq data, they deconvolved the cell types responsible for the expression of major inflammatory cytokines such as IFNG (from CD4 and CD8 T cells), IL4/13 (from basophils), IL17A (from gd T cells), and IL1B from neutrophils and macrophages. They found the highest upregulation of a type 1 inflammatory response, centering around IFNG produced by CD4 and CD8 T cells. They further identified a subpopulation of dermal fibroblasts that upregulate CXCL9/10 during ACD and provided functional genetic evidence in their mouse model that disrupting IFNG signaling to fibroblasts decreases CD8 T cell infiltration and overall inflammation. They identify an increase in IFNG-expressing CD8 T cells in human patient samples of ACD vs. healthy control skin and co-localization of CD8 T cells with PDGFRA+ fibroblasts, which suggests this mechanism is relevant to human ACD. This mechanism is reminiscent of recent work (Xu et al., Nature 2022) showing that IFNG signaling in dermal fibroblasts upregulates CXCL9/10 to recruit CD8 T cells in a mouse model of vitiligo. Overall, this is a very wellpresented, clear, and comprehensive manuscript. The conclusions of the study are mostly well supported by data, but some aspects of the work could be improved by additional clarification of the identity of the cell types shown to be involved, including the exact subpopulation discovered by scRNA-seq and the subtype of CD8 T cell involved. The study was limited by its use of one ACD model (DNFB), which prevents an assessment of how broadly relevant this axis is. The human sample validation is slightly circumstantial and limited by the multiplexing capacity of immunofluorescence markers.

      Strengths:

      Through deep characterization of the in vivo ACD model, the authors were able to determine which cell types were expressing the major cytokines involved in ACD inflammation, such as IFNG, IL4/13, IL17A, and IL1B. These analyses are well-presented and thoughtful, showing first that the response is IFNG-dominant, then focusing on deeper characterization of lymphocytes, myeloid cells, and fibroblasts, which are also validated and complemented by FACS experiments using canonical markers of these cell types as well as IF staining. Crosstalk analyses from the scRNA-seq data led the authors to focus on IFNG signaling fibroblasts, and in vitro experiments demonstrate that CXCL9 and CXCL10 are expressed by fibroblasts stimulated by IFNG. In vivo functional genetic evidence demonstrates an important role for IFNG signaling in fibroblasts, as KO of Ifngr1 using Pdgfra-Cre Ifngr1 fl/fl mice, showed a reduction in inflammation and CD8 T cell recruitment.

      Weaknesses:

      (1) The use of one model limits an understanding of how broad this fibroblast-T cell axis is during ACD. However, the authors chose the most commonly employed model and cited additional work in a vitiligo model (another type 1 immune response).

      We thanks the reviewer for pointing out this limitation. Although the DNFB-elicited ACD model is the most commonly used animal model for ACD, our study is limited by the use of only one type 1 immune response model. We have now added new data (Figure 5-figure supplement 1A) showing that the active ACD pAd (r5) and the active IFNγ-responsive vitiligo dFBs (Xu et al., 2022) are enriched with a highly similar panel of IFNγ-inducible genes. Future studies are still needed to determine whether this fibroblast-T cell axis may be broadly applied to other ACD models or to other type-1 immune response-related inflammatory skin diseases.

      (2) The identity of the involved fibroblasts and T cells in the mouse model is difficult to assess as scRNA-seq identified subpopulations of these cell types, but most work in the Pdgfra-Cre Ifngr1 fl/fl mice used broad markers for these cell types as opposed to matched subpopulation markers from their scRNA-seq data.

      Thanks for the reviewer's constructive comments. To better showcase the dWAT layer where PDGFRA+ pAds are enriched, we have included new histological staining and PLIN1 (adipocyte marker) in new Figure 4 - figure supplement 1F-G. As shown in Figure 4 - figure supplement 1G, the PLIN1+ dWAT layer is located in the lower dermis right above the cartilage layer.  In Figure 4-figure supplement 1I and J, we have shown that phosphor-STAT1 (pSTAT1), a key signaling molecule activated by IFNγ, was detected primarily in PDGFRA+Ly6A+ pAds in the lower dermis where dWAT is located. In addition, we have now included new data showing that the pAd (dFB_r5) cluster preferentially expressed the highest levels of both Ifngr1 and Ifngfr2 among all dFB subclusters (new Figure 5 - figure supplement 1B). Furthermore, we have included new co-staining data showing that CXCL9 largely co-localized with ICAM1(new Figure 4 - figure supplement 1K), a marker for committed pAds (Merrick et al., 2019), in the reticular dermis and dWAT region of the ACD skin, further confirming that CXCL9 is specifically induced in the pAd subset of dFBs. Additionally, we included new staining data showing that ACD-mediated induction of CXCL9 in ICAM1+ dFBs were largely suppressed upon targeted deletion of Ifngr1 in Pdgfra+ dFBs (new Figure 6 - figure supplement 1D-E).

      (3) Human patient samples of ACD were co-stained with two markers at a time, demonstrating the presence of CD8+IFNG+ T cells, PDGFRA+CXCL10+ fibroblasts, and co-localization of PDGFRA+ fibroblasts and CD8+ T cells. However, no IF staining demonstrates co-expression of all 4 markers at once; thus, the human validation of co-localization of CD8+IFNG+ T cells and PDGFRA+CXCL10+ fibroblasts is ultimately indirect, although not a huge leap of faith. Although n=3 samples of healthy control and ACD samples are used, there is no quantification of any results to demonstrate the robustness of differences.

      Thanks for the reviewer’s constructive comments. We have shown that PDGFRA colocalizes with CXCL10, in the dermal micro-vascular structures, where CD8+ T cells infiltrate around PDGFRA+ dFBs. We are sorry that due to technical issues (antibody compatibility), we cannot provide the four color co-staining as suggested by the reviewers. In order to demonstrate the robustness and reproducibility of the staining presented, we have now supplemented 4 independent images for both Fig. 7A and Fig. 7E in the updated Figure 7-figure supplement 1A-B.

      Reviewer #2 (Public Review):

      Summary:

      The investigators apply scRNA seq and bioinformatics to identify biomarkers associated with DNFB-induced contact dermatitis in mice. The bioinformatics component of the study appears reasonable and may provide new insights regarding TH1-driven immune reactions in ACD in mice. However, the IF data and images of tissue sections are not clear and should be improved to validate the model.

      Strengths:

      The bioinformatics analysis.

      Weaknesses:

      The IF data presented in 4H, 6H, 7E and 7F are not convincing and need to be correlated with routine staining on histology and different IF markers for PDGFR. Some of the IF staining data demonstrates a pattern inconsistent with its target.

      We are sorry for the confusion, because 4H and 6H are staining on mouse skin sections, and 7E and 7F are staining on human skin sections, therefore the patterns of PDGFRA+ dFBs appeared inconsistent between species. As shown in Fig. 4H, in mouse skin, PDGFRA+CXCL9/10+ dFBs are located between the lower reticular dermis and dWAT region, where preadipocytes are located (Sun et al., 2023). To better showcase the dWAT layer where PDGFRA+ pAds are enriched, we have included new histological staining and PLIN1 (adipocyte marker) in new Figure 4 - figure supplement 1F-G. As shown in Figure 4 - figure supplement 1G, the PLIN1+ dWAT layer is located in the lower dermis right above the cartilage layer. Furthermore, we have included new co-staining data showing that CXCL9 largely co-localized with ICAM1(new Figure 4 - figure supplement 1K), a marker for committed pAds (Merrick et al., 2019), in the reticular dermis and dWAT region of the ACD skin, further confirming that CXCL9 is specifically induced in the pAd subset of dFBs.   

      As shown in Fig. 7E, in human skin, PDGFRA+CXCL10+ dFBs are located within the microvascular structures located at the dermal-epidermal junction (DEJ) region, where mesenchymal stem cells are enriched (Russell-Goldman & Murphy, 2020). We have included the corresponding HE histological staining image for Fig. 4H in new Figure 4-supplement 1F. Histological staining for Fig. 6H is the HE staining image in Fig. 6F. The histological staining for Fig. 7E and 7F is shown by Masson’s trichrome staining shown in Fig. 7C (a three-colour histological staining).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) While the focus on fibroblast and T cell interactions and overall biological findings regarding these interactions (IFNG - CXCL9/10 - CXCR3) is sound, it is slightly confusing about which exact subpopulations of these cells are involved in ACD pathogenesis as both scRNA-seq and IF are used but very broad markers are used for IF. Regarding fibroblasts, the scRNA-seq identifies the pAd (r5) cluster of fibroblasts as the main producer of CXCL9/10. However, the expression of IFNGR1 was not shown for this subpopulation as well as for other fibroblast subpopulations. Figure 6C shows IFNGR1 staining in the Ifngr1 fl/fl control mice which appears quite broad. With the seemingly broad expression of IFNGR1, why is it that only a subpopulation of fibroblasts upregulate CXCL9/10? Is there a specific location of these pAd fibroblasts that help drive this IFNG response? Please show the expression of Ifngr1 in the fibroblast scRNA-seq data.

      Thanks for the reviewer’s constructive comments. We have now included new data showing that the pAd (dFB_r5) cluster preferentially expressed higher levels of both Ifngr1 and Ifngfr2 among all dFB subclusters (new Figure 5 - figure supplement 1B). In addition, we included new co-staining data showing that CXCL9 largely co-localized with ICAM1, a marker for committed pAds (Merrick et al., 2019), in the reticular dermis and dWAT region of the ACD skin, further confirming that CXCL9 is specifically induced in the pAd subset of dFBs.

      (2) Regarding T cells, it is slightly confusing regarding what role the fibroblast-produced CXCL9/10 plays on T cell migration vs. activation. This is mainly because in vitro work focuses on T cell activation, while in vivo work seems to mainly assess T cell migration into the tissue. The in vivo studies have nicely shown that CD8 T cells are the main cell type affected by Ifngr1 iKO (i.e., a reduction of these cells), but T cell activity in vivo is not assessed (in the form of IFNG production). I have the following related questions:

      a. Authors do not discuss whether T cells involved in ACD in their model are tissue-resident memory T cells (Trm) or whether these are recruited from circulation. This may be possible to assess via additional analysis of the scRNA-seq data (looking for expression of Trm markers). 

      Thanks for the reviewer’s constructive comments. We have now included new data showing the expression of marker genes of various memory T cells in various T cell subclusters (new Figure 2 - figure supplement 1C-D). Antigen-specific CD8 or CD4 memory T cells can be classified into CD62hi/CCR7hi/CD28hi/CD27hi/CX3CR1lo central memory T cells (Tcm), CX3CR1hi/Cd28hi/Cd27lo/CD62lo/CCR7lo effector memory T cells (Tem), and CD49ahi/CD103hi/ CD69hi/BLIMP1hi tissue-resident memory T cells (Trm) (Benichou, Gonzalez, Marino, Ayasoufi, & Valujskikh, 2017; Cheon, Son, & Sun, 2023; Mackay et al., 2013; Martin & Badovinac, 2018; Park et al., 2023). We observed that in ACD skin, CD4+ and CD8+ T cells predominantly expressed marker genes associated with Tcm including Cd28, Cd27, Ccr7, and S1pr1/Cd62l. In contrast, marker genes associated with Tem (Cx3cr1) and Trm (Itga1/Cd49a, Itgae/Cd103, Cd69 and Prdm1/Blimp1, Cd127/Il7r) were only scarcely expressed in these αβ T cells, suggesting that ACD predominantly triggers a central memory T cell response in the skin.

      Furthermore, this hypothesis is supported by new lymph node gene expression results. We showed that the expression of Ifng, but not Il4 or Il17a, was rapidly induced in skin draining lymph nodes at 24 hours after ACD elicitation (new Figure 1-figure supplement 1H). This suggests a robust and systemic activation of type 1 memory T cell response in the early stage of ACD, and the migration of these lymphatic memory T cells to the skin may contribute to the exacerbation of skin inflammation.

      b. Authors have focused on CXCR3 axis involvement in IFNG production (Figures 5G-H) without assessing the presumed migratory role of this axis. Presumably, CD8 T cells are recruited to the skin via the CXCL9/10-CXCR3 axis, but this would be important to clarify given other work that has demonstrated Trm involvement in ACD. Authors should at least discuss how their model and findings support, refine, or even contradict the current paradigm of Trm involvement in ACD (Lefevre et al., 2021; PMID: 34155157).

      We are grateful for the constructive feedback provided by the reviewer. CXCR3 is a chemokine receptor on T cells and not only plays a pivotal role in the trafficking of type 1 T cells, but also is required for optimal generation of IFNG-secreting type 1 T cells in vivo (Groom et al., 2012). Our in vitro study is limited by only focusing on CXCL9/10-CXCR3 axis involvement in IFNγ production without studying its role in driving T cell migration. We have now addressed this limitation in the discussion section.

      In the murine model of ACD, the initial sensitization phase involves exposing mouse skin to a high dose of DNFB to prime effector T cells in lymphoid organs, and this is followed by a later challenge/elicitation phase, during which the mice are re-exposed to a lower dose of DNFB in a different area of the skin, distal from the original sensitization site (Manresa, 2021; Vocanson, Hennino, Rozieres, Poyet, & Nicolas, 2009). Our updated analysis of the expression of marker genes associated with central memory T cells (Tcm), effector memory T cells (Tem), and tissue-resident memory T cells (Trm), as presented in the revised Figure 2-figure supplement 1C-D, indicates that indicate that the type-1 inflammation observed upon ACD elicitation is predominantly driven by memory T cells recruited from lymphoid organs, rather than by skin resident memory T cells. We have read the reference provided by the reviewer along with a few other related studies indicating that Trm is involved in ACD. We found that these studies performed the elicitation phase on the same skin area where the initial sensitization is conducted, and only when it results in a rapid allergen-induced skin inflammatory response, that is primarily mediated by IL17A-producing and IFNγ-producing CD8+ skin resident memory T cells (Gadsboll et al., 2020; Murata & Hayashi, 2020; Schmidt et al., 2017; Wongchang et al., 2023). These studies suggest that Trm cells establish a long-lasting local memory during the initial sensitization, and upon re-exposure to the hapten in the same skin area, these site-specific Trm cells can rapidly contribute to a robust type-1 skin inflammatory response. Therefore, a robust involvement of Trm in ACD requires a repeated exposure of the same hapten to the same skin area. We have now added related discussion in the discussion section.

      c. While it may be difficult to assess given reduced numbers of CD8 T cells in the Ifngr1 iKO, is the CXCL9/10-CXCR3 axis affecting IFNG production by T cells in vivo?

      Yes, we have shown in Fig. 6G that ACD-mediated induction of Ifng was significantly suppressed in the Ifngr1-iKO mice compared to the control mice.

      (3) The authors cite prior work (Xu et al. Nature 2022) that demonstrated a similar mechanism for fibroblasts in recruiting vitiligo-inducing T cells. Are the pAd (r5) cluster of fibroblasts similar to the fibroblast subpopulation that drives vitiligo?

      The study on mouse model of vitiligo (Xu et al. Nature 2022) did not perform single-cell RNAseq of the vitiligo mouse skin. Instead, they conducted RNAseq analysis on the sorted PDGFRA+ dFBs. Therefore, we cannot directly compare our pAd (r5) cluster with the fibroblast subpopulation that drives vitiligo. Nevertheless, by utilizing a Venn diagram to compare the top 100 lFNγ signaling dependent genes upregulated in the active vitiligo mouse dFBs and the top 100 genes enriched in our ACD pAd (dFB_r5) cells, we identified 29 commonly upregulated genes between the two conditions (Figure 5-figure supplement 1A). Furthermore, all these 29 genes were among the top IFNγ-inducible genes in primary dFBs. These shared genes include CXCL9, CXCL10, and several other downstream targets of IFNγ signaling, such as B2M, BST2, CD274, as well as the GBP family members GBP3, GBP4, GBP5, GBP7, and additional genes like H2-K1, H2-Q4, H2-Q7, H2-T23, IFIT3, ISG15, and STAT1. This result suggests that the pAd (dFB_r5) cells possess a common IFNγ-pathway gene signature with the active vitiligo mouse dFBs, indicating a potential overlap in molecular pathways.

      (4) The authors should include bulk RNA-seq data from fibroblast stimulation (Figure 5b) at a minimum in the GEO submission. They should ideally include the differentially expressed genes in a supplementary table.

      Thanks for the reviewer’s constructive comments. We have now included the raw FPKM file for the bulk RNAseq data shown in Fig. 5 in Supplemental Table S3, and the list for differentially expressed genes in Supplemental Table S4.

      (5) The authors state that human sample stainings were n = 3 per group for healthy control and ACD (Figure 7), but no quantification or statistical testing is provided to demonstrate significant differences in findings such as co-localization of fibroblasts and T cells, IFNG+CD8+ T cells, etc.

      Thanks for the reviewer’s constructive comments. We have now supplemented 4 independent images for both Fig. 7A and Fig. 7E in the new Figure 7-figure supplement 1A-B to demonstrate the robustness and reproducibility of the staining presented.

      Minor comments:

      (1) Figure 1G, possible typos, Il14 and Il11b are on the violin plots when I believe authors meant Il4 and Il1b.

      Thank a lot for pointing out these typos. We have now made the correction in the updated manuscript figure 1.

      (2) The authors label cluster 27 as neutrophils based on the expression of Ly6g and S100a8. These markers are also expressed by Cd14+ inflammatory monocytes. I believe the authors need to additionally validate that these cells are neutrophils (via staining or additional analyses). Neutrophils are notoriously difficult to capture in scRNA-seq given low RNA content. Later, they are quantified by FACS using CD11b+Ly6G+ markers, but I do not believe this would distinguish them from CD14+ monocytes. As this is a relatively minor aspect of the manuscript, I consider this a minor concern, but a finding that should be as accurate as possible as Il1b is likely important, and identifying its accurate source likewise.

      Thanks a lot for reviewer’s constructive comments. According to the reviewer’s suggestion, we have now added Cd14 expression in Figure 1C, and found that indeed cluster 27 express not only expressed Ly6G but also expressed Cd14. Based on literatures, the expression of Ly6G in circulating blood, spleen, and peripheral tissues is limited to neutrophils, whereas monocytes, macrophages, and lymphocytes are negative of Ly6G (Ikeda et al., 2023; Lee, Wang, Parisini, Dascher, & Nigrovic, 2013). Therefore, Ly6G can be used as a marker to distinguish neutrophils and monocytes. Although CD14 is highly expressed in monocytes, neutrophils can also express CD14 at lower level (Antal-Szalmas, Strijp, Weersink, Verhoef, & Van Kessel, 1997). Therefore, the cluster 27 is likely a mixed population of neutrophils and monocytes. So we have changed the definition of this cluster as NEU/Mon in the updated manuscript.

      To confirm the presence of neutrophils and monocytes in ACD, we have included new FACS analysis of inflammatory monocytes, which are gated as CD11B+Ly6G-F4/80-CD11C-Ly6Chi, according to published FACS protocol(Rose, Misharin, & Perlman, 2012). We found that elicitation of ACD led to a transient influx of monocytes at 24 hrs post treatment, whereas the percentage of neutrophils continued to increase by 60 hours post-treatment (Figure 3L, and Figure 3-figure supplement 1G). In addition, at 60 hrs, the percentage of neutrophils (~5%) was > 10 times greater than the percentage of monocytes (~0.4%), indicating that neutrophils are the dominant granulocytes at 60 hours post ACD elicitation.

      (3) The authors should include a cluster marker table as a supplementary file to accompany Figure 1C. Only top cluster markers are shown in 1C.

      Thanks a lot for reviewer’s constructive comments. We have now included the top 5 enriched genes in each cell clusters shown in Fig. 1C in supplementary Table S2.

      (4) Figures 2A/B have mismatched labels. There is a gdT/ILC2 label in the 2B, but not in 2A. Please match these. Along these lines, which gdT cluster is the IL17A expressing cluster as shown in 1D? Matching these labels will clarify which population is doing what.

      Thanks a lot for reviewer to point out this mistake. To avoid confusion about the T cell clusters, we have added a specific recluster# for the T cell clusters as r0~r7 (Figure 2A-B). The r4 cluster is a mixed population of δγT and ILC2, therefore termed as δγT/ILC2. As shown in Figure 2-figure supplement 1E, IL17A is primarily expressed in the δγT cell (r5). We have now corrected δγT2 to δγT/ILC2 throughout the manuscript. To avoid confusion, we have now added cluster # in updated Figure 2D.

      (5) In Figure 3E, the authors used CD11B as a distinguishing marker for basophils (CD11B+) vs. mast cells (CD11B-). Mcpt8 is a better distinguishing marker, so I am wondering why the authors chose CD11B.

      Thanks a lot for reviewer’s comments. In scRNAseq, we did use Mcpt8 as a basophil specific marker to distinguish basophils and mast cells (see Figure 1C). However, Mcpt8 is not a surface receptor that can be used in FACS analysis. Therefore, to distinguish basophils from mast cells by FACS, we have to choose surface markers expressed on these cells. FcεR1a is a highly specific markers expressed exclusively on basophils and mast cells, and CD11B is expressed on basophils but not in mature mast cells (Hamey et al., 2021). As a result, FACS analysis of the surface expression of CD11B and FceR1a can distinguish basophils (CD11B+ FcεR1a+) from mast cells (CD11B- FcεR1a+). The use of CD11B and FcεR1a to distinguish basophils and mast cells can also been see in a published reference study (Arinobu et al., 2005).

      (6) Antibody information is missing for IF studies. No clones, catalog numbers, vendors, RRIDs, or dilutions are included in the Methods section for any of the IF data.

      Thanks a lot for reviewer’s constructive comments. We have now added related information for all the antibodies we used for FACS or IF data in the method section.

      (7) Figure 3 supplement E and F appear to be reversed based on legend descriptions.

      Thank a lot for pointing this out. We have now made the correction in the updated Supplementary file.

      References:

      Antal-Szalmas, P., Strijp, J. A., Weersink, A. J., Verhoef, J., & Van Kessel, K. P. (1997). Quantitation of surface CD14 on human monocytes and neutrophils. J Leukoc Biol, 61(6), 721-728. doi:10.1002/jlb.61.6.721

      Arinobu, Y., Iwasaki, H., Gurish, M. F., Mizuno, S., Shigematsu, H., Ozawa, H., . . . Akashi, K. (2005). Developmental checkpoints of the basophil/mast cell lineages in adult murine hematopoiesis. Proc Natl Acad Sci U S A, 102(50), 18105-18110. doi:10.1073/pnas.0509148102

      Benichou, G., Gonzalez, B., Marino, J., Ayasoufi, K., & Valujskikh, A. (2017). Role of Memory T Cells in Allograft Rejection and Tolerance. Front Immunol, 8, 170. doi:10.3389/fimmu.2017.00170

      Cheon, I. S., Son, Y. M., & Sun, J. (2023). Tissue-resident memory T cells and lung immunopathology. Immunol Rev, 316(1), 63-83. doi:10.1111/imr.13201

      Gadsboll, A. O., Jee, M. H., Funch, A. B., Alhede, M., Mraz, V., Weber, J. F., . . . Bonefeld, C. M. (2020). Pathogenic CD8(+) Epidermis-Resident Memory T Cells Displace Dendritic Epidermal T Cells in Allergic Dermatitis. J Invest Dermatol, 140(4), 806-815 e805. doi:10.1016/j.jid.2019.07.722

      Groom, J. R., Richmond, J., Murooka, T. T., Sorensen, E. W., Sung, J. H., Bankert, K., . . . Luster, A. D. (2012). CXCR3 chemokine receptor-ligand interactions in the lymph node optimize CD4+ T helper 1 cell differentiation. Immunity, 37(6), 1091-1103. doi:10.1016/j.immuni.2012.08.016

      Hamey, F. K., Lau, W. W. Y., Kucinski, I., Wang, X., Diamanti, E., Wilson, N. K., . . . Dahlin, J. S. (2021). Single-cell molecular profiling provides a high-resolution map of basophil and mast cell development. Allergy, 76(6), 1731-1742. doi:10.1111/all.14633

      Ikeda, N., Kubota, H., Suzuki, R., Morita, M., Yoshimura, A., Osada, Y., . . . Asano, K. (2023). The early neutrophil-committed progenitors aberrantly differentiate into immunoregulatory monocytes during emergency myelopoiesis. Cell Rep, 42(3), 112165. doi:10.1016/j.celrep.2023.112165

      Lee, P. Y., Wang, J. X., Parisini, E., Dascher, C. C., & Nigrovic, P. A. (2013). Ly6 family proteins in neutrophil biology. J Leukoc Biol, 94(4), 585-594. doi:10.1189/jlb.0113014

      Mackay, L. K., Rahimpour, A., Ma, J. Z., Collins, N., Stock, A. T., Hafon, M. L., . . . Gebhardt, T. (2013). The developmental pathway for CD103(+)CD8+ tissue-resident memory T cells of skin. Nat Immunol, 14(12), 1294-1301. doi:10.1038/ni.2744

      Manresa, M. C. (2021). Animal Models of Contact Dermatitis: 2,4-Dinitrofluorobenzene-Induced Contact Hypersensitivity. Methods Mol Biol, 2223, 87-100. doi:10.1007/978-1-0716-1001-5_7

      Martin, M. D., & Badovinac, V. P. (2018). Defining Memory CD8 T Cell. Front Immunol, 9, 2692. doi:10.3389/fimmu.2018.02692

      Merrick, D., Sakers, A., Irgebay, Z., Okada, C., Calvert, C., Morley, M. P., . . . Seale, P. (2019). Identification of a mesenchymal progenitor cell hierarchy in adipose tissue. Science, 364(6438). doi:10.1126/science.aav2501

      Murata, A., & Hayashi, S. I. (2020). CD4(+) Resident Memory T Cells Mediate Long-Term Local Skin Immune Memory of Contact Hypersensitivity in BALB/c Mice. Front Immunol, 11, 775. doi:10.3389/fimmu.2020.00775

      Park, S. L., Christo, S. N., Wells, A. C., Gandolfo, L. C., Zaid, A., Alexandre, Y. O., . . . Mackay, L. K. (2023). Divergent molecular networks program functionally distinct CD8(+) skin-resident memory T cells. Science, 382(6674), 1073-1079. doi:10.1126/science.adi8885

      Rose, S., Misharin, A., & Perlman, H. (2012). A novel Ly6C/Ly6G-based strategy to analyze the mouse splenic myeloid compartment. Cytometry A, 81(4), 343-350. doi:10.1002/cyto.a.22012

      Russell-Goldman, E., & Murphy, G. F. (2020). The Pathobiology of Skin Aging: New Insights into an Old Dilemma. Am J Pathol, 190(7), 1356-1369. doi:10.1016/j.ajpath.2020.03.007

      Schmidt, J. D., Ahlstrom, M. G., Johansen, J. D., Dyring-Andersen, B., Agerbeck, C., Nielsen, M. M., . . . Bonefeld, C. M. (2017). Rapid allergen-induced interleukin-17 and interferon-gamma secretion by skin-resident memory CD8(+) T cells. Contact Dermatitis, 76(4), 218-227. doi:10.1111/cod.12715

      Sun, L., Zhang, X., Wu, S., Liu, Y., Guerrero-Juarez, C. F., Liu, W., . . . Zhang, L. J. (2023). Dynamic interplay between IL-1 and WNT pathways in regulating dermal adipocyte lineage cells during skin development and wound regeneration. Cell Rep, 42(6), 112647. doi:10.1016/j.celrep.2023.112647

      Vocanson, M., Hennino, A., Rozieres, A., Poyet, G., & Nicolas, J. F. (2009). Effector and regulatory mechanisms in allergic contact dermatitis. Allergy, 64(12), 1699-1714. doi:10.1111/j.1398-9995.2009.02082.x

      Wongchang, T., Pluangnooch, P., Hongeng, S., Wongkajornsilp, A., Thumkeo, D., & Soontrapa, K. (2023). Inhibition of DYRK1B suppresses inflammation in allergic contact dermatitis model and Th1/Th17 immune response. Sci Rep, 13(1), 7058. doi:10.1038/s41598-023-34211-x

      Xu, Z., Chen, D., Hu, Y., Jiang, K., Huang, H., Du, Y., . . . Chen, T. (2022). Anatomically distinct fibroblast subsets determine skin autoimmune patterns. Nature, 601(7891), 118-124. doi:10.1038/s41586-021-04221-8

    1. Author response:

      The following is the authors’ response to the original reviews.

      Main points:

      (1) We have added data for fructose in Fig. 1

      (2) We have added sta1s1cs (red stars and NS) comparing Tp between fed and refed flies. 

      (3) We have modified the figure for each point to the opened small circles.

      (4) We have moved the data from Fig. S3 to Fig. 2 and 3.

      (5) We have added the schema1c diagrams depic1ng behavioral assay in Fig. S1.

      (6) We have added heatmaps for WT and Gr64f-Gal4>UAS-CsChrimson flies in Fig. S2.

      (7) We have added Orco1 mutant data in Fig. S4.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents valuable findings that gustation and feeding state influence the preferred environmental temperature preference in flies. Interestingly, the authors showed that by refeeding starved animals with the non-nutritive sugar sucralose, they are able to tune their preference towards a higher temperature in addition to nutrient-dependent warm preference. The authors show that temperature-sensing and sweet-sensing gustatory neurons (SGNs) are involved in the former but not the latter. In addition, their data indicate that pep3dergic signals involved in internal state and clock genes are required for taste-dependent warm preference behavior.

      The authors made an analogy of their results to the cephalic phase response (CPR) in mammals where the thought, sight, and taste of food prepare the animal for the consumption of food and nutrients. They further linked this behavior to core regulatory genes and peptides controlling hunger and sleep in flies having homologues in mammals. These valuable behavioral results can be further inves3gated in flies with the advantage of being able to dissect the neural circuitry underlying CPR and nutrient homeostasis.

      Strengths: 

      (1) The authors convincingly showed that tasting is sufficient to drive warm temperature preference behavior in starved flies and that it is independent of nutrient-driven warm preference. 

      (2) By using the genetic manipulation of key internal sensors and genes controlling internal feeding and sleep states such as DH44 neurons and the per genes for example, the authors linked gustation and temperature preference behavior control to the internal state of the animal. 

      Weaknesses: 

      (1) The title is somewhat misleading, as the term homeostatic temperature control linked to gustation only applies to starved flies. 

      We agree with the reviewer's suggestion and have changed the title to "Taste triggers a homeostatic temperature control in hungry flies".

      (2) The authors used a temperature preference assay and refeeding for 5 minutes, 10 minutes, and 1 hour.

      Experimentally, it makes a difference if the flies are tested immediately after 10 minutes or at the same 3me point as flies allowed to feed for 1 hour. Is 10 minutes enough to change the internal state in a nutrition-dependent manner? Some of the authors' data hint at it (e.g. refeeding with fly food for 10 minutes), but it might be relevant to feed for 5/10 minutes and wait for 55/50min to do the assays at comparable time points. 

      Thank you for your suggestions. The temperature preference behavioral test itself takes 30 minutes from the time the flies are placed in the apparatus until the final choice is made. This means that after the hungry flies have been refed for 5 minutes, they will determine their preferred temperature within 35 minutes. It has been shown that insulin levels peak at 10 minutes and gradually decline (Tsao, et al., PLoS Genetics 2023). However, it is unclear how subtle insulin levels affect behavior and how quickly the flies are able to consume food. These factors may contribute to temperature preference in flies. Therefore, to minimize "extraneous" effects, we decided to test the behavioral assay immediately after they had eaten the food. We have noted in the material and method section that why we chose the condition based on behavior duration and insulin effect. 

      (3) A figure depicting the temperature preference assay in Figure 1 would help illustrate the experimental approach. It is also not clear why Figure 1E is shown instead of full statistics on the individual panels shown above (the data is the same). 

      We have revised Figure 1A and added statistics in Figure 1BCD. We also added a figure depicting the temperature preference assay (Fig. S1).

      (4) The authors state that feeding rate and amount were not changed with sucralose and glucose. However, the FLIC assay they employed does not measure consumption, so this statement is not correct, and it is unclear if the intake of sucralose and glucose is indeed comparable. This limits some of the conclusions. 

      We agree and removed “amount” and have revised the MS. 

      (5) The authors make a distinction between taste-induced and nutrient-induced warm preference. Yet the statistics in most figures only show the significance between the starved and refed flies, not the fed controls. As the recovery is in many cases incomplete and used as a distinction of nutritive vs nonnutritive signals (see Figure 1E) it will be important to also show these additional statistics to allow conclusions about how complete the recovery is. 

      We agree with the comments and have revised the MS and figures. 

      (6) The starvation period used is ranging from 1 to 3 days, as in some cases no effect was seen upon 1 day of starvation (e.g. with clock genes or temperature sensing neurons). While the authors do provide a comparison between 18-21 and 26-29 hours old flies in Figure S1, a comparison for 42-49 and 66-69 hours of starvation is missing. This also limits the conclusion as the "state" of the animal is likely quite different after 1 day vs. 3 days of starvation and, as stated by the authors, many flies die under these conditions.  

      We mainly used 2 overnights of starvation.  Some flies (e.g. Ilp6 mutants) were completely healthy even after 2 overnights of starvation, we had to starve them for 3 overnights. For example, Ilp6 mutants needed 3 overnights of starvation to show a significant difference Tp between fed and starved flies. On the other hand, some flies (e.g. w1118 control flies) were very sick after 2 overnights of starvation, we had to starve them for one overnight. Therefore, the starvation conditions which we used for this manuscript are from 1- 3-overnights.

      First, we confirmed the starvation time by focusing on Tp which resulted in a sta1s1cally significant Tp difference between fed and starved flies; as men1oned above, flies prefer lower temperatures when starvation is prolonged (Umezaki et al., Current Biology 2018). Therefore, if Tp was not statistically different between fed and starved flies, we extended the starva1on 1me from 1 to 3 overnights. Importantly, we show in Fig. S3 that the dura1on of starvation did not affect the recovery effect. Furthermore, since control flies do not survive 42-49 or 66-69 hours of starvation, we can not test the reviewer's suggestion. We have carefully documented the conditions in the Material and method and figure legends.

      (7) In Figure 2, glucose-induced refeeding was not tested in Gr mutants or silenced animals, which would hint at post-ingestive recovery mechanisms related to nutritional intake. This is only shown later (in Figure S3) but I think it would be more fitting to address this point here. The data presented in Figure S3 regarding the taste-evoked vs nutrient-dependent warm preference is quite important while in some parts preliminary. It would nonetheless be justified to put this data in the main figures. However, some of the conclusions here are not fully supported, in part due to different and low n numbers, which due to the inherent variability of the behavior do not allow statistically sound conclusions. The authors claim that sweet GRNs are only involved in taste-induced warm preference, however, glucose is also nutritive but, in several cases, does not rescue warm preference at all upon removal of GRN function (see Figures S3A-C). This indicates that the Gal4 lines and also the involved GRs are potentially expressed in tissues/neurons required for internal nutrient sensing. 

      Thank you for your suggestion. We have added Figure S3ABC (glucose refeeding using Gr mutants and silenced animals) to Figure 2. There is no low N number since we tested > 5 times, i.e. >100 flies were tested. Tp may have a variation probably due to the effect of starvation on their temperature preference. 

      We did not mention that "The authors claim that sweet GRNs are only involved in taste-induced warm preference...". However, our wri1ng may not be clear enough. We agree that "...GRs may be expressed in tissues/neurons required for internal nutrient sensing. ..."  We have rewritten and revised the section.  

      (8) In Figure 4, fly food and glucose refeeding do not fully recover temperature preference after refeeding. With the statistical comparison to the fed control missing, this result is not consistent with the statement made in line 252. I feel this is an important point to distinguish between state-dependent and taste/nutrition-dependent changes.  

      We inserted the statistics and compared between Fed and other conditions. 

      (9) The conclusion that clock genes are required for taste-evoked warm preference is limited by the observation that they ingest less sucralose. In addition, the FLIC assay does not allow conclusions about the feeding amount, only the number of food interactions. Therefore, I think these results do not allow clear-cut conclusions about the impact of clock genes in this assay.  

      We agree and remove “amount” and have revised the MS. The per01 mutants ate (touched) sucralose more often than glucose. On the other hand, 1m01 mutants ate glucose more often than sucralose (Figure S6BC). However, these mutants s1ll showed a similar TP pattern for sucralose and glucose refeeding (Fig. 5CD). The results suggest that the 1m01 flies eat enough amount of sucralose over glucose that their food intake does not affect the TP behavioral phenotype. We have rewritten and revised the section.

      (10) CPR is known to be influenced by taste, thought, smell, and sight of food. As the discussion focused extensively on the CPR link to flies it would be interesting to find out whether the smell and sight of food also influence temperature preference behavior in animals with different feeding states.  

      We have added the data using Olfactory receptor co-receptor (Orco1) mutant, which lack olfaction, in Fig. S4. They failed to show the taste-evoked warm preference, but exhibited the nutrient-induced warm preference. Therefore, the data suggest that olfactory detection is also involved in taste-evoked warm preference. On the other hand, "seeing food" is probably more complicated, since light dramatically affects temperature preference behavior and the circadian clock that regulates temperature preference rhythms. Therefore, it will not be unlikely to draw a solid conclusion from the short set of experiments. We will address this issue in the next study.

      (11) In the discussion in line 410ff the authors claim that "internal state is more likely to be associated with taste-evoked warm preference than nutrient-induced warm preference." This statement is not clear to me, as neuropeptides are involved in mediating internal state signals, both in the brain itself as well as from gut to brain. Thus, neuropeptidergic signals are also involved in nutrient-dependent state changes, the authors might just not have identified the peptides involved here. The global and developmental removal of these signals also limits the conclusions that can be drawn from the experiments, as many of these signals affect different states, circuits, and developmental progression.  

      We agree with the comments. We have removed the sentences and revised the MS.  

      Reviewer #2 (Public Review): 

      Animals constantly adjust their behavior and physiology based on internal states. Hungry animals, desperate for food, exhibit physiological changes immediately upon sensing, smelling, or chewing food, known as the cephalic phase response (CPR), involving processes like increased saliva and gastrointestinal secretions. While starvation lowers body temperature, the mechanisms underlying how the sensation of food without nutrients induces behavioral responses remain unclear. Hunger stress induces changes in both behavior and physiological responses, which in flies (or at least in Drosophila melanogaster) leads to a preference for lower temperatures, analogous to the hunger-driven lower body temperature observed in mammals. In this manuscript, the authors have used Drosophila melanogaster to investigate the issue of whether taste cues can robustly trigger behavioral recovery of temperature preference in starving animals. The authors find that food detection triggers a warm preference in flies. Starved flies recover their temperature preference after food intake, with a distinction between partial and full recovery based on the duration of refeeding. Sucralose, an artificial sweetener, induces a warm preference, suggesting the importance of food-sensing cues. The paper compares the effects of sucralose and glucose refeeding, indicating that both taste cues and nutrients contribute to temperature preference recovery. The authors show that sweet gustatory receptors (Grs) and sweet GRNs (Gustatory Receptor Neurons) play a crucial role in taste-evoked warm preference. Optogenetic experiments with CsChrimson support the idea that the excitation of sweet GRNs leads to a warm preference. The authors then examine the internal state's influence on taste-evoked warm preference, focusing on neuropeptide F (NPF) and small neuropeptide F (sNPF), analogous to mammalian neuropeptide Y. Mutations in NPF and sNPF result in a failure to exhibit taste-evoked warm preference, emphasizing their role in this process. However, these neuropeptides appear not to be critical for nutrient-induced warm preference, as indicated by increased temperature preference during glucose and fly food refeeding in mutant flies. The authors also explore the role of hunger-related factors in regula3ng taste-evoked warm preference. Hunger signals, including diuretic hormone (DH44) and adipokinetic hormone (AKH) neurons, are found to be essential for taste-evoked warm preference but not for nutrient-induced warm preference. Additionally, insulin-like peptides 6 (Ilp6) and Unpaired3 (Upd3), related to nutritional stress, are identified as crucial for taste-evoked warm preference. The investigation then extends into circadian rhythms, revealing that taste-evoked warm preference does not align with the feeding rhythm. While flies exhibit a rhythmic feeding pattern, taste-evoked warm preference occurs consistently, suggesting a lack of parallel coordination. Clock genes, crucial for circadian rhythms, are found to be necessary for taste-evoked warm preference but not for nutrient-induced warm preference. 

      Strengths: 

      A well-written and interesting study, investigating an intriguing issue. The claims, none of which to the best of my knowledge controversial, are backed by a substantial number of experiments. 

      Weakness: 

      The experimental setup used and the procedures for assessing the temperature preferences of flies are rather sparingly described. Additional details and data presentation would enhance the clarity and replicability of the study. I kindly request the authors to consider the following points: 

      i) A schematic drawing or diagram illustrating the experimental setup for the temperature preference assay would greatly aid readers in understanding the spatial arrangement of the apparatus, temperature points, and the positioning of flies during the assay. The drawing should also be accompanied by specific details about the setup (dimensions, material, etc). 

      Thank you for your suggestions. We have added the schematic drawing in Fig. S1.

      ii) It would be beneficial to include a visual representation of the distribution of flies within the temperature gradient on the apparatus. A graphical representation, such as a heatmaps or histograms, showing the percentage of flies within each one-degree temperature bin, would offer insights into the preferences and behaviors of the flies during the assay. In addition to the detailed description of the assay and data analysis, the inclusion of actual data plots, especially for key findings or representative trials, would provide readers with a more direct visualization of the experimental outcomes. These additions will not only enhance the clarity of the presented information but also provide the reader with a more comprehensive understanding of the experimental setup and results. I appreciate the authors' attention to these points and look forward to the potential inclusion of these elements in the revised manuscript. 

      Thank you for the advice. We have added the heat map for WT and Gr64fGal4>CsChrimson data in Fig. S2. 

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript by Yujiro Umezaki and colleagues aims to describe how taste stimuli influence temperature preference in Drosophila. Under starvation flies display a strong preference for cooler temperatures than under fed conditions that can be reversed by refeeding, demonstrating the strong impact of metabolism on temperature preference. In their present study, Umezaki and colleagues observed that such changes in temperature preference are not solely triggered by the metabolic state of the animal but that gustatory circuits and peptidergic signalling play a pivotal role in gustation-evoked alteration in temperature preference. 

      The study of Umezaki is definitively interesting and the findings in this manuscript will be of interest to a broad readership. 

      Strengths: 

      The authors demonstrate interesting new data on how taste input can influence temperature preference during starvation. They propose how gustatory pathways may work together with thermosensitive neurons, peptidergic neurons and finally try to bridge the gap between these neurons and clock genes. The study is very interesting and the data for each experiment alone are very convincing. 

      Weaknesses: 

      In my opinion, the authors have opened many new questions but did not fully answer the initial question - how do taste-sensing neurons influence temperature preferences? What are the mechanisms underlying this observation? Instead of jumping from gustatory neurons to thermosensitive neurons to peptidergic neurons to clock genes, the authors should have stayed within the one question they were asking at the beginning. How does sugar sensing influence the physiology of thermos-sensation in order to change temperature preference? Before addressing all the following question of the manuscript the authors should first directly decipher the neuronal interplay between these two types of neurons. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Figure S3D is cited before S2, so please rearrange the numbering.

      Thank you. We have changed the numbering.

      I would also suggest a different color to visualize the data points in Figure S3, as some are barely visible on the dark bars (e.g. on a dark green background). 

      We have revised the figures. The data points were changed to smaller opened circles. 

      Reviewer #2 (Recommendations For The Authors): 

      *Please, expand on the experimental procedure, and describe the assay in detail. 

      We have added a scheme for the assay in Fig. S1 and also have revised the manuscript and figures.

      *Show the distribution of the gradient data that the preference values are based upon. Not necessarily for all, but for select key experiments. Heatmaps for each replicate (stacked on top of each other) would be a nice way of showing this. Simple histograms would of course work as well. 

      We have added heatmaps of selected key experiments that were added in Fig. S2. We have revised the manuscript and figures, correspondingly.

      Reviewer #3 (Recommendations For The Authors  

      The manuscript by Yujiro Umezaki and colleagues aims at describing how taste stimuli influence temperature preference in Drosophila. Under starvation, flies display a strong preference for cooler temperatures than under-fed conditions that can be reversed by refeeding, demonstrating the strong impact of metabolism on temperature preference. In their present study, Umezaki and colleagues observed that such changes in temperature preference are not solely triggered by the metabolic state of the animal but that gustatory circuits play a pivotal role in temperature preference. The study of Umezaki is definitively interesting and the findings in this manuscript will be of interest to a broad readership. However, I would like to draw the authors' attention to some points of concern: 

      The title to me sounds somehow inadequate. The definition of homeostasis (Cambridge Dictionary) is as follows: "the ability or tendency of a living organism, cell, or group to keep the conditions INSIDE it the same despite any changes in the conditions around it, or this state of internal balance". What do the authors mean by homeostatic temperature control? Reading the title not knowing much about poikilotherm insects I would understand that the authors claim that Drosophila can indeed keep a temperature homeostasis as mammals do. As Drosophila is not a homoiotherm animal and thus cannot keep its body temperature stable the title should be amended.  

      Homeostasis means a state of balance between all the body systems necessary for the body to survive and function properly. Drosophila are ectotherms, so the source of temperature comes from the environment, and their body temperature is very similar to that of their environment. However, the flies' temperature regulation is not simply a passive response to temperature. Instead, they actively seek a temperature based on their internal state. We have shown that the preferred temperature increases during the day and decreases during the night, showing a circadian rhythm of temperature preference (TPR). Because their environmental temperature is very close to their body temperature, TPR gives rise to body temperature rhythms (BTR). We have shown that TPR is similar to BTR in mammals. (Kaneko et al., Current Biology 2012 and Goda et al., JBR 2023). Similarly, we showed that the hungry flies choose a lower temperature so that the body temperature is also lower. Therefore, our data suggest that the fly maintains its homeostasis by using the environmental temperature to adjust its body temperature to an appropriate temperature depending on its internal state. Therefore, I would like to keep the title as "Taste triggers a homeostatic temperature control in hungry flies" We have added more explana1on in the Introduc1on and Discussion.

      Accordingly, the authors compare the preference of flies to cooler temperatures to the reduced body temperature of mammals (Lines 64 - 65). However, according to the cited literature the reduced body temperature in starved rats is discussed to reduce metabolic heat production (Sakurada et al., 2000). The authors should more rigorously give a short summary of the findings in the cited papers and the original interpretation to help the reader not get confused.

      In flies, it has been shown that a lower temperature means a lower metabolic rate, and a higher temperature means a higher metabolic rate. Therefore, hungry flies choose a lower temperature where their metabolic rate is lower and they do not need as much heat.

      Similarly, in mammals, starvation causes a lower body temperature, hypothermia. Body temperature is controlled by the balance between heat loss and heat production. The starved mammals showed lower heat production. We have added this information to the introduction. 

      The authors show that 5 min fly food refeeding causes a par3al recovery of the naïve temperature preference of the flies (Figure 1B) and that feeding of sucralose par3ally rescues the preference whereas glucose rescues the preference similar to refeeding with fly food would do. As glucose is both sweet and metabolically valuable it would be clearer for the reader if the authors start with the fly food experiment and then show the glucose experiment to show that the altered temperature preference depends on the food component glucose. From there they can further argue that glucose is both sweet (hedonic value) and metabolically valuable. And to disentangle sweetness from metabolism one needs a sugar that is sweet but cannot be metabolized - sucralose. 

      Thank you for your advice. Since the data with sucralose is the one we want to highlight the most, we decided to present it in the order of sucralose, glucose, and fly food.

      In the sucralose experiment the authors omit the 5 min data point and only show the 10 min time point. As Figure 1F indicates that both Glucose and Sucralose elicit the same attractiveness in the flies and that sweetness influences the temperature preference, it is important that the authors show the 5 min temperature preference too to underline the effect of the sweet taste stimulus on the fly behavior independent from the caloric value. Further, the authors should demonstrate not only the cumulative touches but how much sucralose or glucose may already be consumed by the fly in the depicted time frames. 

      It is interesting to see how much sucralose or glucose the flies consume over the time frames shown. Although the cumula1ve exposure to sugar is ideally equivalent to the amount of sugar, we need a different way to actually measure the amount of sugar. We will now emphasize "cumulative touches" rather than "amount of sugar" in the text. In the next study, we will look at how much sucralose or glucose the fly has already consumed.

      Sucralose and Glucose have a similar molecular structure - it would be interesting to see how the sweet taste of a sugar with a different molecular structure like fructose and its receptor Gr43b (Myamato & Amrein 2014) may contribute to temperature preferences.  

      Sucralose and Glucose are not structurally similar. That said, we tested fructose refeeding anyway. The hungry flies showed a taste-evoked warm preference after fructose refeeding. We have added data in Figure 1E and F. The data suggest that sweet taste is more important than sugar structure. We also tested Gr43b>CsChrimson. However, the flies do not show the taste-evoked warm preference (data not shown). The data suggest that Gr43b is not the major receptor controlling taste-evoked warm preference. We have revised the manuscript.

      Both sugars appear similarly attractive to the flies (Figure 1F) - are water, sucralose, and glucose presented in a choice assay or are these individually in separate experiments? 

      Water, sucralose, and glucose were individually presented in separate experiments. We clarified it in the figure legend.

      Subsequently, the authors address the question of how sweet taste may influence temperature preferences in flies. To this end, the authors first employ gustatory receptor mutants for Gr5a, Gr64a, and Gr61a and demonstrate that sucralose feeding does not rescue temperature preference in the absence of sweet taste receptors. In an alternative approach, the authors do not use mutants but an expression of UAS:Kir in Gr64F neurons. Taking a closer look at the graph it appears that the Kir expressing flies have an increased (nearly 1{degree sign}C) temperature preference than the starved mutant flies. Is this preference change related to the mutation directly and what would be the result if Kir would be conditionally only expressed after development is completed, or is the observed temperature preference related to the Gr64f-Gal4 line? If the latter would be the case perhaps the authors may want to bring the flies to the same genetic background to allow for a more direct comparison of the temperature preferences. 

      The Gr64fGal4>Kir flies show a ~one degree higher preferred temperature under starvation compared to the mutants. However, the phenotype is similar to the controls, Gr64fGal4/+ flies, under starvation. Therefore, this phenotype is not due to either the mutation or the Kir effect. Most importantly, the Gr64fGal4>Kir flies failed to show a taste-evoked warm preference. Together with other mutant data, we concluded that sweet GRNs are required for taste-evoked warm preference.

      Overall, the figure legend for Figure 2 is very cryptic and should be more detailed.

      We have revised the figure legend for Figure 2. 

      To shed light on the mechanisms underlying the changes in temperature preferences through gustatory stimuli the authors next blocked heat and cold sensing neurons in fed and starved flies and found out that TrpA1 expressing anterior cells and R11F02-Gal4 expressing neurons both participate in sweetness-induced alteration of temperature preference in starved animals. At this point, it should be explicitly indicated in the figure that the flies need more than one overnight starva3on to display the behavior (Figure 3A). 

      We have revised the manuscript.

      The data provided by the authors indicate a kind of push-and-pull mechanism between heat and cold-sensing neurons under starvation that is somehow influenced by sweet taste sensing. Further, the authors demonstrate that TrpA1-as well as R11F02-Gal4 driven Chrimson activation is sufficient to partially rescue temperature preference under starvation. At this point is unclear why the authors use a tubGal80ts expression system but not for the TrpA1SH-Gal4 driven Chrimson. As the development itself and the conditions under which the animals were raised may have influence on the temperature preference it is important that both groups are equally raised if the authors want to directly compare with each other. 

      As we wrote in the Material and Method, the R11F02-Gal4>uas-CsChrimson flies died during the development. Therefore, we had to use tubGal80ts. On the other hand, the TrpA1-Gal4>CsChrimson flies can survive to adults. As we mentioned in MS, all flies were treated with ATR after they had fully developed into adults. This means that both TrpA1-Gal4 and R11F02-Gal4 expressing cells are ac1vated by red light via CsChrimson only in adult stages. We carefully revised the MS.

      It is a pity that the authors at this point have decided to not deepen the understanding of the circuitry between thermo-sensation and metabolic homeostasis but subsequently change the focus of their study to investigate how internal state influences taste-evoked warm preference in hungry flies. Using mutants for NPF and sNPF the authors demonstrate that both peptides play a pivotal role in taste-evoked warm preference after sucrose feeding but not for nutrient-induced warm preference. Similarly, they found that DH44, AKH and dILP6, Upd2 and Upd3 neurons are also required for taste-evoked warm preference but not for nutrient-induced warm preference. Here again, the authors do not keep the systems stable and change between inhibition of neurons through Kir and mutants for peptides. For a better comparison, it would be preferable to use always exactly the same technique to inhibit neuron signalling.

      It would be interesting to find the neural circuity of thermo-sensation and metabolic homeostasis, but we do not have any luck so far. We will continue to look into the neural circuits which control taste-evoked warm preference and nutrient-induced warm preference. Since UAS-Kir is such a strong reporter, it may kill the flies sometime. So we couldn't use UAS-Kir for all Gal4 flies. 

      DH44 is expressed in the brain and in the abdominal ganglion where they share the expression pattern with 4 Lk neurons per hemisphere. Seeing the impact of Lk signalling in metabolism (AlAnzi et al., 2010) the authors should provide evidence that the observed effect is indeed because of DH44 and not Lk.

      It would be interesting to see if Lk may play a role in taste-evoked warm preference and/or nutrient-induced warm preference. We would like to systematically screen which neuropeptides and receptors are involved in the behavior in the next study. 

      Seeing the results on dILP6 it is interesting that Li and Gong (2015) could show in larvae that cold-sensing neurons directly interact with dILP neurons in the brain. It would be interesting to see whether similar circuitry may exist in adult flies to regulate temperature preferences and these peptidergic neurons. Further, it appears interesting that again these animals need much longer time to display the observed shift in temperature (which again should be clearly indicated in the figure legend too). These observations should be more carefully considered in the discussion part too.

      We have revised the manuscript.

      In the last part of the study, the authors investigate how sensory input from temperature-sensitive cells may transmit information to central clock neurons and how these in turn may influence temperature preference under starvation. The experiments assume that DH44-expressing neurons play a role in the output pathway of the central clock. Using the clock gene null mutants per and tim the authors show that even though the animals display a significant starvation response neither per nor tim mutants exhibited taste-evoked warm preference, indicating a taste but not nutrient-evoked temperature preference regulation. 

      The authors demonstrate interesting new data on how taste input can influence temperature preference during starvation. They propose how gustatory pathways may work together with thermosensitive neurons, peptidergic neurons and finally try to bridge the gap between these neurons and clock genes. The study is very interesting and the data for each experiment alone are very convincing. However, in my opinion, the authors have opened many new questions but did not fully answer the initial question - how do taste-sensing neurons influence temperature preferences? What are the mechanisms underlying this observation? Instead of jumping from gustatory neurons to thermosensitive neurons to peptidergic neurons to clock genes, the authors should have stayed within the one question they were asking at the beginning. How does sugar sensing influence the physiology of thermos-sensation? Before addressing all the following questions of the manuscript the authors should first directly decipher the neuronal interplay between these two types of neurons. 

      Thank you for your suggestion. It would be interesting to find the neural circuity of thermo-sensation and metabolic homeostasis. We have tried but there is no luck so far. 

      The authors could e.g., employ Ca or cAMP-imaging in anterior or cold-sensitive cells and see how the responsiveness of these cells may be altered after sugar feeding. Or at least follow the idea of Li and Gong about the thermos-regulation of dILP-expressing neurons. 

      Thank you for your suggestion. Since we do not know how dlLP-expression neurons are involved in temperature response in the adult flies. We will focus on the cells using Calcium imaging for the next study.

      Anatomical analysis using the GRASP technique may further help to understand the interplay of these neurons and give new insights into the circuitry underlying food preference alteration under starvation. 

      Thank you for your suggestion. It would be interesting to find the neural circuity of thermo-sensation and metabolic homeostasis. We have tried but there is no luck so far.  

      Minor comments: 

      Line 51: Hungry animals are desperate for food - I think the authors should not anthropomorphize at this point too\ much but rather strictly describe how the animals change their behavior without any interpretation of the mental state of the animal. 

      We have modified the manuscript.

      Line 80: Hunger and satiety dramatically affect animal behavior and physiology and control feeding - please not only cite the papers but also give a short overview of the cited papers on which behaviors are altered and how. 

      We have revised the manuscript. 

      Overall statistic: The authors do comparative statistics always against starved animals throughout but often state in the text a comparison against fed (Line 111: "but did not reach that of the fed flies") I think the authors should describe the date according to their statistics and keep this constant throughout the paper. 

      Sorry for the confusion. We originally had it, but we removed it. We have added the additional statistical analyses.  

      Figure legends: Overall the figure legends could be more developed and more detailed.

      We have revised the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      As adult-born granule neurons have been shown to play diverse roles, both positive and negative, to modulate hippocampal circuitry and function in epilepsy, understanding the mechanisms by which altered neurogenesis contributes to seizures is important for future therapeutic strategies. The work by Jain et al. demonstrates that increasing adult neurogenesis before status epilepticus (SE) leads to a suppression of chronic seizures in the pilocarpine model of temporal lobe epilepsy. This work is potentially interesting because previous studies showed suppressing neurogenesis led to reduced chronic seizures.

      To increase neurogenesis, the authors conditionally delete the pro-apoptotic gene Bax using a tamoxifen-inducible Nestin-CreERT2 which has been previously published to increase proliferation and survival of adult-born neurons by Sahay et al. After 6 weeks of tamoxifen injection, the authors subjected male and female mice to pilocarpine-induced SE. In the first study, at 2 hours after pilocarpine, the authors examine latency to the first seizure, severity and total number of acute seizures, and power during SE. In the second study in a separate group of mice, at 3 weeks after pilocarpine, the authors examine chronic seizure number and frequency, seizure duration, postictal depression, and seizure distribution/cluster seizures. Overall, the study concludes that increasing adult neurogenesis in the normal adult brain can reduce epilepsy in females specifically. However, important BrdU birthdating experiments in both male and female mice need to be included to support the conclusions made by the authors. Furthermore, speculative mechanisms lacking direct evidence reduce enthusiasm for the findings.

      There are two suggestions. First, BrdU birthdating of newborn neurons is important to add to the paper so that there is support for the conclusions. Second, speculative text reduced enthusiasm. In response, we clarified the conclusions. We do not think that the clarified conclusions require BrdU birthdating (discussed further below). We also removed two schematics (and associated text) that we think the reviewer was referring to when speculation was mentioned.

      We also want to point out something minor -that the times of injections listed above are not correct.

      a. Seizures were not measured 2 hrs after pilocarpine; that is when the anticonvulsant diazepam was administered to males. 

      b. Seizures were not measured 3 weeks after pilocarpine; the duration of recording was 3 weeks.  

      (1) BrdU birthdating is required for conclusions.

      We think that the Reviewer was suggesting birthdating because we were not clear about our conclusions, and we apologize for the confusion. The Reviewer stated that we concluded: “conditionally deleting Bax in Nestin-Cre+ cells leads to increased neurogenesis and hilar ectopic granule cells, thereby reducing chronic seizures.”  (Note this is a quote from the review).

      However, we did not intend to conclude that. We intended to conclude that conditionally deleting Bax in Nestin-Cre+ mice reduced chronic seizures in the mouse model of epilepsy that we used. Also, that conclusion only pertained to females. Please note we did not conclude that hilar ectopic granule cells led to reduced seizures. We also concluded that Bax deletion increased neurogenesis in female mice. We have revised the text to make the conclusions clear.

      Abstract, starting on line 67:

      The results suggest that selective Bax deletion to increase adult neurogenesis can reduce experimental epilepsy, and the effect shows a striking sex difference.

      Results, starting on line 448:

      Because Cre+ epileptic females had increased numbers of immature neurons relative to Cre- females at the time of SE, and prior studies show that Cre+ females had less neuronal damage after SE (Jain et al., 2019), female Cre+ mice might have had reduced chronic seizures because of high numbers of immature neurons. However, the data do not prove a causal role.

      Starting on line 477:

      ...we hypothesized that female Cre+ mice would have fewer hilar ectopic GCs than female Cre- mice. However, that female Cre+ mice did not have fewer hilar ectopic GCs.

      Discussion, starting on line 563:

      The chronic seizures, measured 4-7 weeks after pilocarpine, were reduced in frequency by about 50% in females. Therefore, increasing young adult-born neurons before the epileptogenic insult can protect against epilepsy. However, we do not know if the protective effect was due to the greater number of new neurons before SE or other effects. Past data would suggest that increased numbers of newborn neurons before SE leads to a reduced SE duration and less neuronal damage in the days after SE. That would be likely to lessen the epilepsy after SE. However, there may have been additional effects of larger numbers of newborn neurons prior to SE.

      Conclusions, starting on line 745:

      In the past, suppressing adult neurogenesis before SE was followed by fewer hilar ectopic GCs and reduced chronic seizures. Here, we show that the opposite - enhancing adult neurogenesis before SE and increased hilar ectopic GCs - do not necessarily reduce seizures. We suggest instead that protection of the hilar neurons from SE-induced excitotoxicity was critical to reducing seizures. The reason for the suggestion is that the survival of hilar neurons would lead to persistence of the normal inhibitory functions of hilar neurons, protecting against seizures. However, this is only a suggestion at the present time because we do not have data to prove it. Additionally, because protection was in females, sex differences are likely to have played an important role. Regardless, the results show that enhancing neurogenesis of young adult-born neurons in Nestin-Cre+ mice had a striking effect in the pilocarpine model, reducing chronic seizures in female mice.

      The Reviewer is correct that it would be interesting to know when the increase in adult neurogenesis occurred that was critical to the effect. For example, was it the initial increase following Bax deletion but before pilocarpine-induced SE, or the increase in neurogenesis following SE, or increased adult neurogenesis in the chronic stage of epilepsy. It also might be that related aspects of neurogenesis played a role such as the degree that maturation was normal in adult-born neurons. We have not pursued the experiments to identify these aspects of neurogenesis because of how much work it would entail. Also, approaches to conclude cause-effect relationships are going to be difficult. 

      (2) Speculation.

      We removed the text and supplemental figures with schematics that we think were the overly speculative parts of the paper the Reviewer mentioned.

      Strengths:

      (1) The study is sex-matched and reveals differences in response to increasing adult neurogenesis in chronic seizures between males and females.

      (2) The EEG recording parameters are stringent, and the analysis of chronic seizures is comprehensive. In two separate experiments, the electrodes were implanted to record EEG from the cortex as well as the hippocampus. The recording was done for 10 hours post pilocarpine to analyze acute seizures, and for 3 weeks continuous video EEG recording was done to analyze chronic seizures.

      Weaknesses:

      (1) Cells generated during acute seizures have different properties to cells generated in chronic seizures. In this study, the authors employ two bouts of neurogenesis stimuli (Bax deletion dependent and SE dependent), with two phases of epilepsy (acute and chronic). There are multiple confounding variables to effectively conclude that conditionally deleting Bax in Nestin-Cre+ cells leads to increased neurogenesis and hilar ectopic granule cells, thereby reducing chronic seizures.

      As mentioned above, with a clarification of our conclusions we think we have addressed the concern. We believe that we conditionally deleted Bax in Nestin-expressing cells. We believe we found that female mice had reduced loss of hilar mossy cells and somatostatin-expressing neurons after SE, and fewer chronic seizures after SE. While it makes sense that increased neurogenesis caused the reduced seizures, we acknowledge it was not proved.

      We do not make conclusions about the role of hilar ectopic granule cells. However, we note that they appear to have been similar in number across groups, which suggests they played no role in the results. This is very surprising and therefore adds novelty.

      (2) Related to this is the degree of neurogenesis between Cre+ and Cre- mice and the nature of the sex differences. It is crucial to know the rate/fold change of increased neurogenesis before pilocarpine treatment and whether it is different between male and female mice.

      We agree that if sex differences in adult neurogenesis could be shown by a sex difference in rate, fold change, maturation, and other characteristics.  However, sex differences can also be shown by a change in doublecortin (DCX), which is what we did. We respectfully submit that we do not see an exhaustive study is critical.

      As a result, we have clarified DCX was studied either before SE or in the period of chronic seizures:

      Results, starting on line 406:

      III. Before and after epileptogenesis, Cre+ female mice exhibited more immature neurons than Cre- female mice but that was not true for male mice.

      Starting on line 446:

      Therefore, elevated DCX occurred after chronic seizures had developed in Cre+ mice but the effect was limited to females.

      Discussion, starting on line 592:

      This study showed that conditional deletion of Bax from Nestin-expressing progenitors increased young adult-born neurons in the DG when studied 6 weeks after deletion and using DCX as a marker of immature neurons.

      (3) The authors observe more hilar Prox1 cells in Cre+ mice compared to Cre- mice. The authors should confirm the source of the hilar Prox1+ cells.

      This is an excellent question but it is unclear that it is critical to the seizures since both sexes showed more hilar Prox1 cells in Cre+ mice but only the females had fewer seizures than Cre- mice. This is the additional text to describe the results (starting on Line 493):

      In past studies, hilar ectopic GCs have been suggested to promote seizures (Scharfman et al., 2000; Jung et al., 2006; Cho et al., 2015). Therefore, we asked if the numbers of hilar ectopic GCs correlated with the numbers of chronic seizures. When Cre- and Cre+ mice were compared (both sexes pooled), there was a correlation with numbers of chronic seizures (Fig. 6D1) but it suggested that more hilar ectopic GCs improved rather than worsened seizures. However, the correlation was only in Cre- mice, and when sexes were separated there was no correlation (Fig. 6D3).

      When seizure-free interval was examined with sexes pooled, there was a correlation for Cre+ mice (Fig. 6D2) but not Cre- mice. Strangely, the correlations of Cre+ mice with seizure-free interval (Fig. 6D2, D4) suggest ectopic GCs shorten the seizure-free interval and therefore worsen epilepsy, opposite of the correlative data for numbers of chronic seizures. In light of these inconsistent results it seems that hilar ectopic granule cells had no consistent effect on chronic seizures.

      (4) The biggest weakness is the lack of mechanism. The authors postulate a hypothetical mechanism to reconcile how increasing and decreasing adult-born neurons in GCL and hilus and loss of hilar mossy and SOM cells would lead to opposite effects - more or fewer seizures. The authors suggest the reason could be due to rewiring or no rewiring of hilar ectopic GCs, respectively, but do not provide clear-cut evidence.

      As we mention above, we removed the supplemental figures with schematics because they probably were what seemed overly speculative.

      We acknowledge that mechanism is not proven by our study. However, we would like to mention that in our view, showing preservation of hilar mossy cells and SOM cells, but not PV cells, does add mechanistic data to the paper. We understand more experiments are necessary.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Jain et al explore whether increasing adult neurogenesis is protective against status epilepticus (SE) and the development of spontaneous recurrent seizures (chronic epilepsy) in a mouse pilocarpine model of TLE. The authors increase adult neurogenesis via conditional deletion of Bax, a pro-apoptotic gene, in Nestin-CreERT2Baxfl/fl mice. Cre- littermates are used as controls for comparisons. In addition to characterizing seizure phenotypes, the authors also compare the abundance of hilar ectopic granule cells, mossy cells, hilar SOM interneurons, and the degree of neuronal damage between mice with increased neurogenesis (Cre+) vs Cre- controls. The authors find less severe SE and a reduction in chronic seizures in female mice with pre-insult increased adult-born neurons. Immunolabeling experiments show these females also have preservation of hilar mossy cells and somatostatin interneurons, suggesting the pre-insult increase in adult neurogenesis is protective.

      Strengths:

      (1) The finding that female mice with increased neurogenesis at the time of pilocarpine exposure have fewer seizures despite having increased hilar ectopic granule cells is very interesting.

      (2) The work builds nicely on the group's prior studies.

      (3) Apparent sex differences are a potentially important finding.

      (4) The immunohistochemistry data are compelling.

      (5) Good controls for EEG electrode implantation effects.

      (6) Nice analysis of most of the SE EEG data.

      Weaknesses:

      (1) In addition to the Cre- littermate controls, a no Tamoxifen treatment group is necessary to control for both insertional effects and leaky expression of the Nestin-CreERT2 transgene.

      About “leaky” expression, we have not found expression to be leaky. We checked by injecting a Cre-dependent virus so that mCherry would be expressed in those cells that had Cre.  The results were published as Supplemental Figure 9 in Jain et al. (2019).

      In the revised manuscript we also mention a study that examined three Nestin-CreERT2 mouse lines (Sun et al., 2014). One of the mouse lines was ours. The leaky expression was not in the mouse line we use. We have added these points to the revised manuscript:

      Methods, section II starting on line 791:

      Although Nestin-Cre-ERT2 mouse lines have been criticized because  they can have leaky expression, the mouse line used in the present study did not (Sun et al., 2014), which we confirmed (Jain et al., 2019).

      (2) The authors suggest sex differences; however, experimental procedures differed between male and female mice (as the authors note). Female mice received diazepam 40 minutes after the first pilocarpine-induced seizure onset, whereas male mice did not receive diazepam until 2 hours post-onset. The former would likely lessen the effects of SE on the female mice. Therefore, sex differences cannot be accurately assessed by comparing these two groups, and instead, should be compared between mice with matching diazepam time courses.

      We agree that a shorter delay between pilocarpine and diazepam would be likely to lead to less damage. However, the latency from pilocarpine to SE varied, making the time from the onset of SE to diazepam variable. Most of the variability was in females. By timing the diazepam injection differently in males and females, we could make the time from the onset of SE to diazepam similar between females and males. We had added a supplemental figure to show that our approach led to no significant differences between females and males in the latency to SE, time between SE and diazepam injection, and time between pilocarpine and diazepam injection. We also show that Cre+ females and Cre- females were not different in these times, so it could not be related to the neuroprotection of Cre+ females.

      Additionally, the authors state that female mice that received diazepam 2 hours post-onset had severe brain damage. This is concerning as it would suggest that SE is more severe in the female than in the male mice.

      We regret that our language was misleading. We intended to say females had more morbidity and mortality than males (lack of appetite and grooming, death in the days after SE) when we gave DZP 2 hrs after Pilo. We actually don’t know why because there were no differences in severity of SE. We think the females had worse outcome when they had a short latency to SE.  These females had a longer period of SE before DZP than males, probably leading to worse outcome. To correct this we gave DZP to females sooner. Then morbidity and mortality was improved in females. 

      Interestingly, after we did this we saw females did not always have a short latency to SE. We maintained the same regimen however, to be consistent. As the new supplemental figure (above) shows, there were significant sex differences in the latency to SE, time between SE and DZP, and time between pilocarpine and DZP.

      (3) Some sample sizes are low, particularly when sex and genotypes are split (n=3-5), which could cause a type II statistical error.

      We agree and have noted this limitation in the Discussion:

      Additional considerations, starting on line 739:

      This study is limited by the possibilities of type II statistical errors in those instances where we divided groups by genotype and sex, leading to comparisons of 3-5 mice/group.

      (4) Several figures show a datapoint in the sex and genotype-separated graphs that is missing from the corresponding male and female pooled graphs (Figs. 2C, 2D, 4B).

      We are very grateful to the Reviewer for pointing out the errors. They are corrected.

      (5) In Suppl Figs. 1B & 1C, subsections 1c and 2c, the EEG trace recording is described as the end of SE; however, SE appears to still be ongoing in these traces in the form of periodic discharges in the EEG.

      The Reviewer is correct.  It is a misconception that SE actually ends completely. The most intense seizure activity may, but what remains is abnormal activity that can last for days. Other investigators observe the same and have suggested that it argues against the concept of a silent period between SE and chronic epilepsy. We had discussed this in our prior papers and had referenced how we define SE.  In the revised manuscript we add the information to the Methods section instead of referencing a prior study:

      Methods, starting on line 899:

      SE duration was defined in light of the fact that the EEG did not return to normal after the initial period of intense activity. Instead, intermittent spiking occurred for at least 24 hrs, as we previously described (Jain et al., 2019) and has been described by others (Mazzuferi et al., 2012; Bumanglag and Sloviter, 2018; Smith et al., 2018). We therefore chose a definition that captured the initial, intense activity. We defined the end of this time as the point when the amplitude of the EEG deflections were reduced to 50% or less of the peak deflections during the initial hour of SE. Specifically, we selected the time after the onset of SE when the EEG amplitude in at least 3 channels had dropped to approximately 2 times the amplitude of the EEG during the first hour of SE, and remained depressed for at least 10 min (Fig. S2 in (Jain et al., 2019). Thus, the duration of SE was defined as the time between the onset and this definition of the "end" of SE.

      (6) In Results section II.D and associated Fig.3, what the authors refer to as "postictal EEG depression" is more appropriately termed "postictal EEG suppression". Also, postictal EEG suppression has established criteria to define it that should be used.

      We find suppression is typical in studies of ECT or humans (Esmaeili et al., 2023; Gascoigne et al., 2023; Hahn et al., 2023; Kavakbasi et al., 2023; Langroudi et al., 2023; Karl et al., 2024; Vilan et al., 2024; Zhao et al., 2024) and animal research uses the term postictal depression(Kanner et al., 2010; Krishnan and Bazhenov, 2011; Riljak et al., 2012; Singh et al., 2012; Carballosa-Gonzalez et al., 2013; Kommajosyula et al., 2016; Smith et al., 2018; Uva and de Curtis, 2020; Medvedeva et al., 2023). Therefore we think depression is a more suitable term.

      The example traces in Fig. 3A and B should also be expanded to better show this potential phenomenon.

      We expanded traces in Fig. 3 as suggested. They are in Fig 3A.

      (7) In Fig.5D, the area fraction of DCX in Cre+ female mice is comparable to that of Cre- and Cre+ male mice. Is it possible that there is a ceiling effect in DCX expression that may explain why male Cre+ mice do not have a significant increase compared to male Cre- mice?

      We thank the Reviewer for the intriguing possibility. We now mention it in the manuscript:

      Results, starting on line 456:

      It is notable that the Cre+ male mice did not show increased numbers of immature neurons at the time of chronic seizures but Cre+ females did. It is possible that there was a “ceiling” effect in DCX expression that would explain why male Cre+ mice did not have a significant increase in immature neurons relative to male Cre- mice.

      (8) In Suppl. Fig 6, the authors should include DCX immunolabeling quantification from conditional Cre+ male mice used in this study, rather than showing data from a previous publication.

      We have made this revision.

      (9) In Fig 8, please also include Fluorojade-C staining and quantification for male mice.

      The additional data for males have been added to part D.

      (10) Page 13: Please specify in the first paragraph of the discussion that findings were specific to female mice with pre-insult increases in adult-born neurogenesis.

      This has been done.

      Minor:

      (11) In Fig. 1 and suppl. figure 1, please clarify whether traces are from male or female mice.

      We have clarified.

      (12) Please be consistent with indicating whether immunolabeling images are from female or male mice.

      a. Fig 5B images labeled as from "Cre- Females" and "Cre+ Females".

      b. Suppl. Fig 8: Images labeled as "Cre- F" and "Cre+ F".

      c. Fig 6: sex not specified.

      d. Fig. 7: sex only specified in the figure legend.

      e. Fig 8: only female mice were included in these experiments, but this is not clear from the figure title or legend.

      We revised all figures according to the comments.

      (13) Page 4: the last paragraph of the introduction belongs within the discussion section.

      We recognize there is a classic view that any discussion of Results should not be in the Introduction. However, we find that view has faded and more authors make a brief summary statement about the Results at the end of the Introduction. We would like to do so because it allow Readers to understand the direction of the study at the outset, which we find is helpful.

      (14) Page 6: The sentence "The data are consistent with prior studies..." is unnecessary.

      We have removed the text.

      (15) Suppl. Fig 6A: Please include representative images of normal condition DCX immunolabeling.

      We have added these data. There is an image of a Cre- female, Cre+ female, Cre- male and Cre+ male in the new figure, Supplemental Figure 6. All mice had tamoxifen at 6 weeks of age and were perfused 6 weeks later. None of the mice had pilocarpine.

      (16) In Suppl. Fig 7C, I believe the authors mean "no loss of hilar mossy and SOM cells" instead of "loss of hilar mossy and SOM cells".

      This Figure was removed because of the input from Reviewer 1 suggesting it was too speculative.

      Reviewer #1 (Recommendations For The Authors):

      (1) The main claim of the study is that increasing adult neurogenesis decreases chronic seizures. However, to quantify adult-born neurons, DCX immunoreactivity is used as the sole metric to determine neurogenesis. This is insufficient as changes in DCX-expressing cells could also be an indicator of altered maturation, survival, and/or migration, not proliferation per se. To claim that increasing adult neurogenesis is associated with a reduction of chronic seizures, the authors should perform a pulse/chase (birth dating) experiment with BrdU and co-labeling with DCX.

      We think that increased DCX does reflect increased adult neurogenesis. However, we agree that one does not know if it was due to increased proliferation, survival, etc. We also note that this mouse line has been studied thoroughly to show there was increased neurogenesis with BrdU, Ki67 and DCX. We mention that paper in the revised text:

      Methods, starting on line 786:

      It was shown that after tamoxifen injection in adult mice there is an increase in dentate gyrus neurogenesis based on studies of bromo-deoxyuridine, Ki67, and doublecortin (Sahay et al., 2011).

      (2) As mentioned above, analysis of DCX staining alone months after TAM injections is limited. Instead, the cells could be labelled by BrdU prior to TAM injection, following which quantification of BrdU+/Prox1+ cells at 6 weeks post TAM injection should be performed in Cre+ and Cre- mice (males and females) to yield the rate of neurogenesis increase.

      We respectfully disagree that birthdating cells is critical. Using DCX staining just before SE, we know the size of the population of cells that are immature at the time of SE. This is what we think is most important because these immature neurons are those that appear to affect SE, as we have already shown.

      (3) To confirm the source of the hilar Prox1+ cells, a dual BrdU/EdU labeling approach would be beneficial. BrdU injection could be given before TAM injection and EdU injection before pilocarpine to label different cohorts of neural stem cells. Co-staining with Prox1 at different time points will help in identifying the origin of hilar ectopic cells.

      We are grateful for the ideas of the Reviewer. We hesitate to do these experiments now because it seems like a new study to find out where hilar granule cells come from.

      REFERENCES

      Bumanglag AV, Sloviter RS (2018) No latency to dentate granule cell epileptogenesis in experimental temporal lobe epilepsy with hippocampal sclerosis. Epilepsia 59:2019-2034.

      Carballosa-Gonzalez MM, Munoz LJ, Lopez-Alburquerque T, Pardal-Fernandez JM, Nava E, de Cabo C, Sancho C, Lopez DE (2013) EEG characterization of audiogenic seizures in the hamster strain gash:Sal. Epilepsy Res 106:318-325.

      Cho KO, Lybrand ZR, Ito N, Brulet R, Tafacory F, Zhang L, Good L, Ure K, Kernie SG, Birnbaum SG, Scharfman HE, Eisch AJ, Hsieh J (2015) Aberrant hippocampal neurogenesis contributes to epilepsy and associated cognitive decline. Nat Commun 6:6606.

      Esmaeili B, Weisholtz D, Tobochnik S, Dworetzky B, Friedman D, Kaffashi F, Cash S, Cha B, Laze J, Reich D, Farooque P, Gholipour T, Singleton M, Loparo K, Koubeissi M, Devinsky O, Lee JW (2023) Association between postictal EEG suppression, postictal autonomic dysfunction, and sudden unexpected death in epilepsy: Evidence from intracranial EEG. Clin Neurophysiol 146:109-117.

      Gascoigne SJ, Waldmann L, Schroeder GM, Panagiotopoulou M, Blickwedel J, Chowdhury F, Cronie A, Diehl B, Duncan JS, Falconer J, Faulder R, Guan Y, Leach V, Livingstone S, Papasavvas C, Thomas RH, Wilson K, Taylor PN, Wang Y (2023) A library of quantitative markers of seizure severity. Epilepsia 64:1074-1086.

      Hahn T et al. (2023) Towards a network control theory of electroconvulsive therapy response. PNAS Nexus 2:pgad032.

      Jain S, LaFrancois JJ, Botterill JJ, Alcantara-Gonzalez D, Scharfman HE (2019) Adult neurogenesis in the mouse dentate gyrus protects the hippocampus from neuronal injury following severe seizures. Hippocampus 29:683-709.

      Jung KH, Chu K, Lee ST, Kim J, Sinn DI, Kim JM, Park DK, Lee JJ, Kim SU, Kim M, Lee SK, Roh JK (2006) Cyclooxygenase-2 inhibitor, celecoxib, inhibits the altered hippocampal neurogenesis with attenuation of spontaneous recurrent seizures following pilocarpine-induced status epilepticus. Neurobiol Dis 23:237-246.

      Kanner AM, Trimble M, Schmitz B (2010) Postictal affective episodes. Epilepsy Behav 19:156-158.

      Karl S, Sartorius A, Aksay SS (2024) No effect of serum electrolyte levels on electroconvulsive therapy seizure quality parameters. J ECT 40:47-50.

      Kavakbasi E, Stoelck A, Wagner NM, Baune BT (2023) Differences in cognitive adverse effects and seizure parameters between thiopental and propofol anesthesia for electroconvulsive therapy. J ECT 39:97-101.

      Kommajosyula SP, Randall ME, Tupal S, Faingold CL (2016) Alcohol withdrawal in epileptic rats - effects on postictal depression, respiration, and death. Epilepsy Behav 64:9-14.

      Krishnan GP, Bazhenov M (2011) Ionic dynamics mediate spontaneous termination of seizures and postictal depression state. J Neurosci 31:8870-8882.

      Langroudi ME, Shams-Alizadeh N, Maroufi A, Rahmani K, Rahchamani M (2023) Association between postictal suppression and the therapeutic effects of electroconvulsive therapy: A systematic review. Asia Pac Psychiatry 15:e12544.

      Mazzuferi M, Kumar G, Rospo C, Kaminski RM (2012) Rapid epileptogenesis in the mouse pilocarpine model: Video-EEG, pharmacokinetic and histopathological characterization. Exp Neurol 238:156-167.

      Medvedeva TM, Sysoeva MV, Sysoev IV, Vinogradova LV (2023) Intracortical functional connectivity dynamics induced by reflex seizures. Exp Neurol 368:114480.

      Riljak V, Maresova D, Jandova K, Bortelova J, Pokorny J (2012) Impact of chronic ethanol intake of rat mothers on the seizure susceptibility of their immature male offspring. Gen Physiol Biophys 31:173-177.

      Sahay A, Scobie KN, Hill AS, O'Carroll CM, Kheirbek MA, Burghardt NS, Fenton AA, Dranovsky A, Hen R (2011) Increasing adult hippocampal neurogenesis is sufficient to improve pattern separation. Nature 472:466-470.

      Scharfman HE, Goodman JH, Sollas AL (2000) Granule-like neurons at the hilar/CA3 border after status epilepticus and their synchrony with area CA3 pyramidal cells: Functional implications of seizure-induced neurogenesis. J Neurosci 20:6144-6158.

      Singh B, Singh D, Goel RK (2012) Dual protective effect of passiflora incarnata in epilepsy and associated post-ictal depression. J Ethnopharmacol 139:273-279.

      Smith ZZ, Benison AM, Bercum FM, Dudek FE, Barth DS (2018) Progression of convulsive and nonconvulsive seizures during epileptogenesis after pilocarpine-induced status epilepticus. J Neurophysiol 119:1818-1835.

      Sun MY, Yetman MJ, Lee TC, Chen Y, Jankowsky JL (2014) Specificity and efficiency of reporter expression in adult neural progenitors vary substantially among nestin-creer(t2) lines. J Comp Neurol 522:1191-1208.

      Uva L, de Curtis M (2020) Activity- and ph-dependent adenosine shifts at the end of a focal seizure in the entorhinal cortex. Epilepsy Res 165:106401.

      Vilan A, Grangeia A, Ribeiro JM, Cilio MR, de Vries LS (2024) Distinctive amplitude-integrated EEG ictal pattern and targeted therapy with carbamazepine in kcnq2 and kcnq3 neonatal epilepsy: A case series. Neuropediatrics 55:32-41.

      Zhao C, Tang Y, Xiao Y, Jiang P, Zhang Z, Gong Q, Zhou D (2024) Asymmetrical cortical surface area decrease in epilepsy patients with postictal generalized electroencephalography suppression. Cereb Cortex 34.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Comment 1: One of the only demonstrations of the expression and physiological significance of TRPCs in VTA DA neurons was published by (Rasmus et al., 2011; Klipec et al., 2016) which are not cited in this paper. In their study, TRPC4 expression was detected in a uniformly distributed subset of VTA DA neurons, and TRPC4 KO rats showed decreased VTA DA neuron tonic firing and deficits in cocaine reward and social behaviors. Update: The authors say they have added a discussion of these papers, but I do not see it in the updated manuscript.

      We thank the reviewer for the suggestion. The discussion for this has been added (line 557-565).

      Comment 2: The authors should report the results (exact data values) of female mice in the results text, or pool the male and female data if the sex differences are not significant.

      We agree with reviewer. Some experiments were further redone with female and the data of male and female mice have been reported in the results of text.

      Comment 3: The selectivity of drugs should be referred as "selective" rather than "specific". 

      Thanks, “specific” has been changed to “selective”.  

      Comment 4: Line 62: typo, "substantia nigra". 

      Thanks, “substantial nigra” has been changed to “substantia nigra” in line 65.  

      Comment 5: Line 77: some new studies suggest that NALCN might have voltage dependency

      (rectification).

      Thanks, description of NALCN voltage dependence has been corrected in line 81-83.

      Comment 6: Line 175: change "less" to "fewer". 

      Thanks, “less” has been changed to “fewer”.

      Comment 7: Line 299: choose one - "was not ... or" or "was neither ... nor". 

      Thanks, this error has been corrected. 

      Comment 8: In Figure 1Aii and Figure 3Bi, it was not specified in the results text or figure legend that C1-C5 represent individual cell until the legend for Figure 4.

      Thanks, these description about gel have been added in the figure legends. 

      Reviewer #2 (Public Review): 

      Comment 1: From the previous review, we mentioned that " 'The HCN' as written in line 69 is a bit misleading, as HCN channels in the heart and brain are different members of a family of channels, although as written in the text, it seems that they are identical." This is still the case (now line 73).

      We agreed with the reviewer’s comments. The introduction about HCN has been corrected (line 74-78). 

      Comment 2: The authors state in line 112 that "most of the experiments were also repeated in female mice" - this is true in the case of most electrophysiological experiments, although not behavioral experiments. Authors should amend the statement in line 112 and clarify in the Discussion section which findings are generalizable between sexes; e.g.:

      a.  Discussion of HCN contribution to VTA DA activity (beginning line 453) should clarify male mice. 

      b.  Similarly, any discussion of behavioral findings should clarify male mice. 

      We agreed with the reviewer’s comments. The sexes of mice used have been noted in the results and discussion. 

      Comment 3: The authors' statement in lines 179-183 ("In contrast, fewer GABAergic neuronal markers (Glutamic acid decarboxylase, GAD1/2 and vesicular GABA transporter, VGAT) co-expressed with the DA neurons, which is consistent with previous studies that VTA DA neurons co-expressing GABAergic neuronal markers mainly project to the lateral habenula") is a little confusing - as stated, it seems that the authors are confirming DA/GABA coexpression in VTA-LHb neurons, which is not the case.

      We agreed with the reviewer’s comments. We corrected this statement (line 182-186).

      Comment 4: Additional information could be included in the Methods section description of Western Blotting procedures - e.g., what thickness of tissue and what size gauge were used to dissect VTA for these experiments?

      Thanks. The description of tissue in Western Blotting procedures has been added.

      Comment 5:

      a. Grammatical errors in line 23 of Abstract (also lines 31-32)

      b. "drove" should read "strove" in line 92 

      c. Grammatical errors in lines 401, 444, and 448 

      We thank the reviewer for pointing out grammatical errors and we corrected them.

      Reviewer #3 (Public Review): 

      Comment 1: The main strength of this study lies on a comprehensive bottom-up approach ranging from patch-clamp recordings to behavioral tasks. These tasks mainly address anxiety-like behaviors and so-called depression-like behaviors (sucrose choice, forced swim test, tail suspension test). The results gathered by means of these procedures are clearcut. However, the reviewer believes that the authors should be more cautious when interpreting immobility responses to stress (forced swim, tail suspension) as "depression-like" responses. These stress models have been routinely used (and validated) in the past to detect the antidepressant properties of compounds under investigation, which by no means indicates that these are depression models. For readers interested by this debate, I suggest to read e.g. De Kloet and Molendijk (Biol. Pscyhiatry 2021).

      We thank the reviewer for the suggestion. We will be more careful and rigorous in the selection of stress models in our subsequent research work.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      We have added the full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals into the results and the figure legends of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the Reviewers and Editors for the constructive comments, which we believe have significantly improved the quality of our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) With respect to the predictions, the authors propose that the subjects, depending on their linguistic background and the length of the tone in a trial, can put forward one or two predictions. The first is a short-term prediction based on the statistics of the previous stimuli and identical for both groups (i.e. short tones are expected after long tones and vice versa). The second is a long-term prediction based on their linguistic background. According to the authors, after a short tone, Basque speakers will predict the beginning of a new phrasal chunk, and Spanish speakers will predict it after a long tone.

      In this way, when a short tone is omitted, Basque speakers would experience the violation of only one prediction (i.e. the short-term prediction), but Spanish speakers will experience the violation of two predictions (i.e. the short-term and long-term predictions), resulting in a higher amplitude MMN. The opposite would occur when a long tone is omitted. So, to recap, the authors propose that subjects will predict the alternation of tone durations (short-term predictions) and the beginning of new phrasal chunks (long-term predictions).

      The problem with this is that subjects are also likely to predict the completion of the current phrasal chunk. In speech, phrases are seldom left incomplete. In Spanish is very unlikely to hear a function-word that is not followed by a content-word (and the opposite happens in Basque). On the contrary, after the completion of a phrasal chunk, a speaker might stop talking and a silence might follow, instead of the beginning of a new phrasal chunk.

      Considering that the completion of a phrasal chunk is more likely than the beginning of a new one, the prior endowed to the participants by their linguistic background should make us expect a pattern of results actually opposite to the one reported here.

      We thank the Reviewer #1 for this pertinent comment and the opportunity to address this issue. A very similar concern was also raised by Reviewer #2. Below we try to clarify the motivations that led us to predict that the hypothesized long-term predictions should manifest at the onset (and not within or the end) of a perceptual chunk. 

      Reviewers #1 and #2 contest a critical assumption of our study i.e., the fact that longterm predictions should occur at the beginning of a rhythmic chunk as opposed to its completion. They also contest the prediction deriving from this view i.e., omitting the first sound in a perceptual chunk (short for Spanish, long for Basque) would lead to larger error responses than omitting a later element. They suggest an alternative view: the omission of tones at the end of a perceptual rhythmic chunk would evoke larger error responses than omissions at its onset, as subjects are more likely to predict the completion of the chunk than its beginning. This view predicts an interaction effect in the opposite direction of our findings. 

      While we acknowledge this as a plausible hypothesis, we believe that the current literature provides strong support for our view. Indeed, many studies in the rhythm and music perception literature have investigated the ERP responses to deviant sounds and omissions placed at different positions within rhythmic patterns (e.g., Ladinig et al., 2009; Bouwer et al., 2016; Brochard et al., 2003; Potter et al., 2009; Yabe et al., 2001). For instance, Lading et al., 2009 presented participants with metrical rhythmical sound sequences composed of eight tones. In some deviant sequences, the first or a later tone was omitted. They found that earlier omissions elicited earlier and higher-amplitude MMN responses than later omissions (irrespective of attention). Overall, this and other studies showed that the amplitude of ERP responses are larger when deviants occur at positions that are expected to be the “start” of a perceptual group - “on the beat” in musical terms - and decline toward the end of the chunk. According to some of these studies, the first element of a chunk is particularly important to track the boundaries of temporal sequences, which is why more predictive resources are invested at that position. We believe that this body of evidence provides robust bases for our hypotheses and the directionality of our predictions.

      An additional point that should be considered concerns the amplitude of the prediction error response elicited by the omission. From a predictive coding perspective, the omission of the onset of a chunk should elicit larger error responses because the system is expecting the whole chunk (i.e., two tones/more acoustic information). On the other hand, the omission of the second tone - in the transition between two tones within the chunk - should elicit a smaller error response because the system is expecting only the missing tone (i.e. less acoustic information). 

      Given the importance of these points, we have now included them in the updated version of the paper, in which we try to better clarify the rationale behind our hypothesis (see Introduction section, around the 10th paragraph).

      (2) The authors report an interaction effect that modulates the amplitude of the omission response, but caveats make the interpretation of this effect somewhat uncertain. The authors report a widespread omission response, which resembles the classical mismatch response (in MEG) with strong activations in sensors over temporal regions. Instead, the interaction found is circumscribed to four sensors that do not overlap with the peaks of activation of the omission response.

      We thank the Reviewer for this comment. As mentioned in the provisional response, the approach employed to identify the presence of an interaction effect was conservative: We utilized a non-parametric test on combined gradiometers data, without making a priori assumptions about the location of the effect, and employed small cluster thresholds (cfg.clusteralpha = 0.05) to increase the chances of detecting highly localized clusters with large effect sizes. The fact that the interaction effect arises in a relatively small cluster of sensors does not alter its statistical robustness. It should be also considered that in the present analyses we focused on planar gradiometer data that, compared to magnetometers and axial gradiometers, present more fine-grained spatial resolution and are more suited for picking up relatively small effects. 

      The partial overlap of the cluster with the activation peaks may simply reflect the fact that different sources contribute to the generation of the omission-MMN, which has been reported in several studies (e.g., Zhang et al., 2018; Ross & Hamm, 2020).  We value the Reviewer’s input and are grateful for the opportunity to address these considerations.

      Furthermore, the boxplot in Figure 2E suggests that part of the interaction effect might be due to the presence of two outliers (if removed, the effect is no longer significant). Overall, it is possible that the reported interaction is driven by a main effect of omission type which the authors report, and find consistently only in the Basque group (showing a higher amplitude omission response for long tones than for short tones). Because of these points, it is difficult to interpret this interaction as a modulation of the omission response.

      We thank the Reviewer for the comment and appreciate the opportunity to address these concerns. We have re-evaluated the boxplot in Figure 2E and want to clarify that the two participants mentioned by Reviewer #1, despite being somewhat distant from the rest of the group, are not outliers according to the standard Tukey’s rule. As shown in the figure below, no participant fell outside the upper (Q3+1.5xIQR) and lower whiskers (Q1-1.5xIQR) of the boxplot. 

      Moreover, we believe that the presence of a main effect of omission type does not impact the interpretation of the interaction, especially considering that these effects emerge over distinct clusters of channels (see Fig. 1 C; Supplementary Fig. 2 A). 

      Based on these considerations - and along with the evidence collected in the control study and the source reconstruction data reported in the new version of the manuscript - we find it unlikely that the interaction effect is driven by outliers or by a main effect of omission type. We appreciate the opportunity provided by the Reviewer to address these concerns, as we believe they strengthen the claim that the observed effect is driven by the hypothesized long-term linguistic priors rather than uncontrolled group differences.

      Author response image 1.

      It should also be noted that in the source analysis, the interaction only showed a trend in the left auditory cortex, but in its current version the manuscript does not report the statistics of such a trend.

      We  appreciate  the  Reviewer’s  suggestion  to  incorporate  more comprehensive source analyses. In the new version of the paper, we perform new analyses on the source data using a new Atlas with more fine-grained parcellations of the regions of interests (ROIs) (Brainnetome atlas; Fan et al., 2016) and focusing on peak activity to increase response’s sensitivity in space and time. We therefore invite the Reviewer to read the updated part on source reconstruction included in the Results and Methods sections of the paper.  

      Reviewer #1 (Recommendations For The Authors):

      While I have described my biggest concerns with respect to this work in the public review, here I list more specific points that I hope will help to improve the manuscript. Some of these are very minor, but I hope you will still find them constructive. 

      (1) I understand the difficulties implied in recruiting subjects from two different linguistic groups, but with 20 subjects per group and a between-groups design, the current study is somewhat underpowered. A post-hoc power analysis shows an achieved power of 46% for medium effect sizes (d = 0.5, and alpha = 0.05, one-sided test). A sensitivity analysis shows that the experiment only has 80% power for effect sizes of d = 0.8 and above. It would be important to acknowledge this limitation in the manuscript. 

      We thank the Reviewer for reporting these analyses. It must be noted that our effect of interest was based on Molnar et al.’s (2016) behavioral experiment, in which a sample size of 16 subjects per group was sufficient to detect the perceptual grouping effect. In Yoshida et al., (2010), the perceptual grouping effect emerged with two groups of 20 7–8-month-old Japanese and English-learning infants. Based on these previous findings, we believe that a sample size of 20 participants per group can be considered appropriate for the current MEG study. We clarified these aspects in the Participants section of the manuscript, in which we specified that previous behavioral studies detected the perceptual grouping with similar sample sizes. Moreover, to acknowledge the limitation highlighted by the Reviewer, we also include the power and sensitivity analysis in a note in the same section (see note 2 in the Participants section).

      (2) All the line plots in the manuscript could be made much more informative by adding 95% CI bars. For example, in Figure 4A, the omission response for the long tone departs from the one for the short tone very early. Adding CIs would help to assess the magnitude of that early difference. Error bars are present in Figure 3, but it is not specified what these bars represent. 

      Thanks for the comments. We added the explanation of the error bars in the new version of Figure 3. For the remaining figures, we prefer maintaining the current version of the ERF, as the box-plots accompanying them provide information about the distribution of the effect across participants.

      (3) In the source analysis, there is only mention of an interaction trend in the left auditory cortex, but no statistics are presented. If the authors prefer to mention such a trend, I think it would be important to provide its stats to allow the reader to assess its relevance. 

      We performed new analysis on the source data, all reported in the updated version of the manuscript.

      (4) In the discussion section, the authors refer to the source analysis and state that "the interaction is evident in the left". But if only a statistical trend was observed, this statement would be misleading. 

      We agree with this comment. We invite the Reviewer to check the new part on source reconstruction, in which contrasts going in the same direction of the sensor level data are performed.

      (5) In the discussion the authors argue that "This result highlights the presence of two distinct systems for the generation of auditory" that operate at different temporal scales, but the current work doesn't offer evidence for the existence of two different systems. The effects of long-term priors and short-term priors presented here are not dissociated and instead sum up. It remains possible that a single system is in place, collecting statistics of stimuli over a lifetime, including the statistics experienced during the experiment. 

      Thanks for pointing that out. We changed the sentence above as follows: “This result highlights the presence of an active predictive system that relies on natural sound statistics learned over a lifetime to process incoming auditory input”.

      (6) In the discussion, the authors acknowledge that the omission response has been interpreted both as pure prediction and as pure prediction error. Then they declare that "Overall, these findings are consistent with the idea that omission responses reflect, at least in part, prediction error signals.". However an argument for this statement is not provided. 

      Thanks for pointing out this lack of argument. In the new version of the manuscript, we explained our rationale as follows: “Since sensory predictive signals primarily arise in the same regions as the actual input, the activation of a broader network of regions in omission responses compared to tones suggests that omission responses reflect, at least in part, prediction error signals”.

      (7) In the discussion the authors present an alternative explanation in which both groups might devote more resources to the processing of long events, because these are relevant content words. Following this, they argue that "Independently on the interpretation, the lack of a main effect of omission type in the control condition suggests that the long omission effect is driven by experience with the native language." However as there was no manipulation of duration in the control experiment, a lack of the main effect of omission type there does not rule out the alternative explanation that the authors put forward. 

      This is correct; thanks for noticing it. We removed the sentence above to avoid ambiguities.

      Minor points: 

      (8) The scale of the y-axis in Figure 2C might be wrong, as it goes from 9 to 11 and then to 12. If the scale is linear, the top value should be 13, or the bottom value should be 10. 

      Figure 2C has been modified accordingly, thanks for noticing the error.

      (9) There is a very long paragraph starting on page 7 and ending on page 8. Toward the end of the paragraph, the analysis of the control condition is presented. That could start a new paragraph.

      Thanks for the suggestion. We modified the manuscript as suggested.

      Reviewer #2 (Public Review):

      (1) Despite the evidence provided on neural responses, the main conclusion of the study reflects a known behavioral effect on rhythmic sequence perceptual organization driven by linguistic background (Molnar et al. 2016, particularly). Also, the authors themselves provide a good review of the literature that evidences the influence of longterm priors in neural responses related to predictive activity. Thus, in my opinion, the strength of the statements the authors make on the novelty of the findings may be a bit far-fetched in some instances.

      Thanks for the suggestion. A similar point was also advanced by Reviewer 1. In general, we believe our work speaks about the predictive nature of such experiencedependent  effects, and show that these linguistic priors shape sensory processes at very early stages. This is discussed in the sixth and seventh paragraphs of the Discussion section. In the new version of the article, we modified some statements and tried to make them more coherent with the scope of the present work. For instance, we changed "This result highlights the presence of two distinct systems for the generation of auditory predictive models, one relying on the transition probabilities governing the recent past, and another relying on natural sound statistics learned over a lifetime“ with “This result highlights the presence of an active predictive system that relies on natural sound statistics learned over a lifetime to process incoming auditory input”.

      (2) Albeit the paradigm is well designed, I fail to see the grounding of the hypotheses laid by the authors as framed under the predictive coding perspective. The study assumes that responses to an omission at the beginning of a perceptual rhythmic pattern will be stronger than at the end. I feel this is unjustified. If anything, omission responses should be larger when the gap occurs at the end of the pattern, as that would be where stronger expectations are placed: if in my language a short sound occurs after a long one, and I perceptually group tone sequences of alternating tone duration accordingly, when I hear a short sound I will expect a long one following; but after a long one, I don't necessarily need to expect a short one, as something else might occur.

      A similar point was advanced by Reviewer #1. We tried to clarify the rationale behind our hypothesis. Please refer to the response provided to the first comment of Reviewer #1 above.

      (3) In this regard, it is my opinion that what is reflected in the data may be better accounted for (or at least, additionally) by a different neural response to an omission depending on the phase of an underlying attentional rhythm (in terms of Large and Jones rhythmic attention theory, for instance) and putative underlying entrained oscillatory neural activity (in terms of Lakatos' studies, for instance). Certainly, the fact that the aligned phase may differ depending on linguistic background is very interesting and would reflect the known behavioral effect.

      We thank the Reviewer for this comment. We explored in more detail the possibility that the aligned phase may differ depending on linguistic background, which is indeed a very interesting hypothesis. In the phase analyses reported below we focused on the instantaneous phase angle time locked to the onset of short and long tones presented in the experiment.

      In short, we extracted time intervals of two seconds centered on the onset of the tones for each participant (~200 trials per condition) and using a wavelet transform (implemented in Fieldtrip ft_freqanalysis) we targeted the 0.92 Hz frequency that corresponds to the rhythm of presentation of our pairs of tones. We extracted the phase angle for each time point and using the circular statistics toolbox implemented in Matlab we computed the Raleigh z scores across all the sensor space for each tone (long and short tone) and group (Spanish (Spa) dominants and Basque (Eus) dominants). This method evaluates the instantaneous phase clustering at a specific time point, thus evaluating the presence of a specific oscillatory pattern at the onset of the specific tone. 

      Author response image 2.

      Here we observe that the phase clustering was stronger in the right sensors for both groups. The critical point is to evaluate the phase angle (estimated in phase radians) for the two groups and the two tones and see if there are statistical differences. We focused first on the sensor with higher clustering (right temporal MEG1323) and observed very similar phase angles for the two groups both for long and short tones (see image below). We then focused on the four left fronto-temporal sensor pairs who showed the significant interaction: here we observed one sensor (MEG0412) with different effects for the two groups (interaction group by tone was significant, p=0.02): for short tones the “Watson (1961) approximation U2 test” showed a p-value of 0.11, while for long tones the p-value was 0.03 (after correction for multiple comparisons). 

      Overall, the present findings suggest the tendency to phase aligning differently in the two groups to long and short tones in the left fronto-temporal hemisphere. However, the effect could be detected only in one gradiometer sensor and it was not statistically robust. The effect in the right hemisphere was statistically more robust, but it was not sensitive to group language dominance. 

      Due to the inconclusive nature of these analyses regarding the role of language experience in shaping the phase alignment to rhythmic sound sequences, we prefer to keep these results in the public review rather than incorporating them in the article.  Nonetheless, we believe that this decision does not undermine the main finding that the group differences in the MMN amplitude are driven by long-term predictions – especially in light of the many studies indicating the MMN as a putative index of prediction error (e.g., Bendixen et al., 2012; Heilbron and Chait, 2018). Moreover, as suggested in the preliminary reply, despite evoked responses and oscillations are often considered distinct electrophysiological phenomena, current evidence suggests that these phenomena are interconnected (e.g., Studenova et al., 2023). In our view, the hypotheses that the MMN reflects differences in phase alignment and long-term prediction errors are not mutually exclusive.

      Author response image 3.

      (4) Source localization is performed on sensor-level significant data. The lack of  sourcelevel statistics weakens the conclusions that can be extracted. Furthermore, only the source reflecting the interaction pattern is taken into account in detail as supporting their hypotheses, overlooking other sources. Also, the right IFG source activity is not depicted, but looking at whole brain maps seems even stronger than the left. To sum up, source localization data, as informative as it could be, does not strongly support the author's claims in its current state. 

      A similar comment was also advanced by Reviewer #1 (comment 2). We appreciate the suggestion to incorporate more comprehensive source analyses. In the new version of the paper, we perform new analyses on the source data using a new Atlas with more fine-grained parcellations of the ROIs, and focusing on peak activity to increase response’s sensitivity in space and time. We therefore invite the Reviewer to read the updated part on source reconstruction included in the Results and Methods sections of the paper. 

      In the article, we report only the source reconstruction data from ROIs in the left hemisphere, because it is there that the interaction effect arises at the sensor level. However, we also explored the homologous regions in the right hemisphere, as requested by the Reviewer. A cluster-based permutation test focusing on the interaction between language group and omission type was performed on both the right STG and IFG data. No significant interaction emerged in any of these regions. Below a plot of the source activity time series over ROIs in the right STG and IFG. 

      Author response image 4.

      Reviewer #2 (Recommendations For The Authors):

      In this set of private recommendations for the authors, I will outline a couple of minor comments and try to encourage additional data analyses that, in my opinion, would strengthen the evidence provided by the study. 

      (1) As I noted in the public review, I believe an oscillatory analysis of the data would, on one hand, provide stronger support for the behavioral effect of rhythmic perceptual organization given the lack of behavioral direct evidence; and, on the other hand, provide evidence (to be discussed if so) for a role of entrained oscillation phase in explaining the different pattern of omission responses. One analysis the authors could try is to measure the phase angle of an oscillation, the frequency of which relates to the length of the binary pattern, at the onset of short and long tones, separately, and compare it across groups. Also, single trials of omission responses could be sorted according to that phase. 

      Thanks for the suggestion. Please see phase analyses reported above.

      (2) I wonder why source activity for the right IFG was not shown. I urge the authors to provide and discuss a more complete picture of the source activity found. Given the lack of source statistics (which could be performed), I find it a must to give an overall view. I find it so because I believe the distinction between perceptual grouping effects due to inherent acoustic differences across languages or semantic differences is so interesting. 

      Thanks again for the invitation to provide a more complete picture of the source activity data. As mentioned in the response above, we invite the Reviewer to read the new related part included in the Results and Methods sections of the paper. In our updated source reconstruction analysis, we find that some regions around the left STG show a pattern that resembles the one found at the sensor-level, providing further support for the “acoustic” (rather than syntactic/semantic) nature of the effect. 

      We did not report ROI analysis on the right hemisphere because the interaction effect at sensor level emerged on the left hemisphere. Yet, we included a summary of this analysis in the public response above. 

      (3) Related to this, I have to acknowledge I had to read the whole Molnar et al. (2016) study to find the only evidence so far that, acoustically, in terms of sound duration, Basque and Spanish differ. This was hypothesized before but only at Molnar, an acoustic analysis is performed. I think this is key, and the authors should give it a deeper account in their manuscript. I spend my review of this study thinking, well, but when we speak we actually bind together different words and the syllabic structure does not need to reflect the written one, so maybe the effect is due to a high-level statistical prior related to the content of the words... but Molnar showed me that actually, acoustically, there's a difference in accent and duration: "Taken together, Experiments 1a and 1b show that Basque and Spanish exhibit the predicted differences in terms of the position of prosodic prominence in their phonological phrases (Basque: trochaic, Spanish: iambic), even though the acoustic realization of this prominence involves not only intensity in Basque but duration, as well. Spanish, as predicted, only uses duration as a cue to mark phrasal prosody." 

      Thanks for the suggestion, the distinction in terms of sound duration in Spanish and Basque reported by Molnar is indeed very relevant for the current study. 

      We add a few sentences to highlight the acoustic analysis by Molnar and the consequent acoustic nature of the reported effect.

      In the introduction: “Specifically, the effect has been proposed to depend on the quasiperiodic alternation of short and long auditory events in the speech signal – reported in previous acoustic analyses (Molnar et al., 2016) – which reflect the linearization of function words (e.g., articles, prepositions) and content words (e.g., nouns, adjectives, verbs).”

      In the discussion, paragraph 3, we changed “We hypothesized that this effect is linked to a long-term “duration prior” originating from the syntactic function-content word order of language, and specifically, from its acoustic consequences on the prosodic structure” with “We hypothesized that this effect is linked to a long-term “duration prior” originating from the acoustic properties of the two languages, specifically from the alternation of short and long auditory events in their prosody”.

      In the discussion, end of paragraph eight: “The reconstruction of cortical sources associated with the omission of short and long tones in the two groups showed that an interaction effect mirroring the one at the sensor level was present in the left STG, but not in the left IFG (fig. 3, B, C, D). Pairwise comparisons within different ROIs of the left STG indicated that the interaction effect was stronger over primary (BA 41/42) rather than associative (BAs 22) portions of the auditory cortex. Overall, these results suggest that the “duration prior” is linked to the acoustic properties of a given language rather than its syntactic configurations”.

      Now, some minor comments: 

      (1) Where did the experiments take place? Were they in accordance with the Declaration of Helsinki? Did participants give informed consent? 

      All the requested information has been added to the updated version of the manuscript. Thanks for pointing out this.

      (2) The fixed interval should be called inter-stimulus interval. 

      Thanks for pointing this out. We changed the wording as suggested.

      (3) The authors state that "Omission responses allow to examine the presence of putative error signals decoupled from bottom-up sensory input, offering a critical test for predictive coding (Walsh et al 2020, Heilbron and Chait, 2018).". However the way omission responses are computed in their study is by subtracting the activity from the previous tone. This necessarily means that in the omission activity analyzed, there's bottom-up sensory input activity. As performing another experiment with a control condition in which a sequence of randomly presented tones with different durations to compare directly the omission activity in both sequences (experimental and control) is possibly too demanding, I at least urge the authors to incorporate the fact that their omission responses do reflect also tone activity. And consider, for future experiments, the inclusion of further control conditions. 

      Thanks for the opportunity to clarify this aspect. Actually, the way we computed the omission MMN is not by subtracting the activity of the previous tone from the omission, but by subtracting the activity of randomly selected tones across the whole experiment. That is, we randomly selected around 120 long and short tones (i.e., about the same number as the omissions); we computed the ERF for the long and short tones; we subtracted these ERF from the ERF of the corresponding short and long omissions. We clarified these aspects in both the Materials and Methods (ERF analysis paragraph) and Results section.

      Moreover, the subtraction strategy - which is the standard approach to calculate the MMN - allows to handle possible neural carryover effects arising from the perception of the tone preceding the omission.

      The sentence "Omission responses allow to examine the presence of putative error signals decoupled from bottom-up sensory input, offering a critical test for predictive coding (Walsh et al 2020, Heilbron and Chait, 2018)." simply refer to the fact that the error responses resulting from an omission are purely endogenous, as omissions are just absence of an expected input (i.e., silence). On the other hand, when a predicted sequence of tones is disrupted by an auditory deviants (e.g., a tone with a different pitch or duration than the expected one), the resulting error response is not purely endogenous, but it partially includes the response to the acoustic properties of the deviant.

      (4) When multiple clusters emerged from a comparison, only the most significant cluster was reported. Why? 

      We found more than one significant cluster only in the comparison between pure omissions vs tones (figure 2 A, B). The additional significant cluster from this comparison is associated with a P-value of 0.04, emerges slightly earlier in time, and goes in the same direction as the cluster reported in the paper i.e., larger ERF responses for omission vs tones. We added a note specifying the presence of this second cluster, along with a figure on the supplementary material (Supplementary Fig. 1 A, B).

      (5) Fig 2, if ERFs are baseline corrected -50 to 0ms, why do the plots show pre-stimulus amplitudes not centered at 0? 

      This is because we combined the latitudinal and longitudinal gradiometers on the ERF obtained after baseline correction, by computing the root mean square of the signals at each sensor position (see also  https://www.fieldtriptoolbox.org/example/combineplanar_pipelineorder/). This information is reported in the methods part of the article.

      (6) Fig 2, add units to color bars. 

      Sure.

      (7) Fig 2 F and G, put colorbar scale the same for all topographies. 

      Sure, thanks for pointing this out.

      (8) The interaction effect language (Spanish; Basque) X omission type (short; long) appears only in a small cluster of 4 sensors not located at the locations with larger amplitudes to omissions. Authors report it as left frontotemporal, but it seems to me frontocentral with a slight left lateralization.

      (1) the fact that the cluster reflecting the interaction effect does not overlap with the peaks of activity is not surprising in our view. Many sources contribute to the generation of the MMN. The goal of our work was to establish whether there is also evidence for a long-term system (among the many) contributing to this. That is why we perform a first analysis on the whole omission response network (likely including many sources and predictive/attentional systems), and then we zoom in and focus on our hypothesized interaction. We never claim that the main source underlying the omissionMMM is the long-term predictive system. 

      (2) The exact location of those sensors is at the periphery of the left-hemisphere omission response, which mainly reflects activity from the left temporal regions. The sensor location of this cluster could be influenced by multiple factors, including (i) the direction of the source dipoles determining an effect; (ii) the combination of multiple sources contributing to the activity measured at a specific sensor location, whose unmixing could be solved only with a beamforming source approach. Based on the whole evidence we collected also in the source analyzes we concluded that the major contributors to the sensor-level interaction are emerging from both frontal and temporal regions.

      Reviewer #3 (Public Review):

      (1) The main weaknesses are the strength of the effects and generalisability. The sample size is also relatively small by today's standards, with N=20 in each group. Furthermore, the crucial effects are all mostly in the .01>P<.05 range, such as the crucial interaction P=.03. It would be nice to see it replicated in the future, with more participants and other languages. It would also have been nice to see behavioural data that could be correlated with neural data to better understand the real-world consequences of the effect.

      We appreciate the positive feedback from Reviewer #3. We agree that it would be nice to see this study replicated in the future with larger sample sizes and a behavioral counterpart. Below are a few comments concerning the weakness highlighted: 

      (i) Concerning the sample size: a similar point was raised by Reviewer #1. We report our reply as presented above: “Despite a sample size of 20 participants per group can be considered relatively small for detecting an effect in a between-group design, it must be noted that our effect of interest was based on Molnar et al.’s (2016) experiment, where a sample size of 16 subjects per group was sufficient to detect the perceptual grouping effect. In Yoshida et al., 2010, the perceptual grouping effect arose with two groups of 20 7–8-month-old Japanese and English-learning infants. Based on these findings, we believe that a sample size of 20 participants per group can be considered appropriate for the current study”. We clarified these aspects in the new version of the manuscript.

      (ii) We believe that the lack of behavioral data does not undermine the main findings of this study, given the careful selection of the participants and the well-known robustness of the perceptual grouping effect (e.g., Iversen 2008; Yoshida et al., 2010; Molnar et al. 2014; Molnar et al. 2016). As highlighted by Reviewer #2, having Spanish and Basque dominant “speakers as a sample equates that in Molnar et al. (2016), and thus overcomes the lack of direct behavioral evidence for a difference in rhythmic grouping across linguistic groups. Molnar et al. (2016)'s evidence on the behavioral effect is compelling, and the evidence on neural signatures provided by the present study aligns with it”. (iii) Regarding the fact that the “crucial effects are all mostly in the .01>P<.05 range”: we want to stress that the approach we used to detect the interaction effect was conservative, using a cluster-based permutation approach with no a priori assumptions about the location of the effect. The robustness of our approach has also been highlighted by Reviewer 2: “Data analyses. Sound, state-of-the-art methodology in the event-related field analyses at the sensor level.” In sum, despite some crucial effects being in the .01>P<.05 range, we believe that the statistical soundness of our analysis, combined with the lack of effect in the control condition, provides compelling evidence for our H1.

      Reviewer #3 (Recommendations For The Authors):

      Figures - Recommend converting all diagrams and plots to vector images to ensure they remain clear when zoomed in the PDF format. 

      Sure, thanks. 

      Figure 1: To improve clarity, the representation of sound durations in panels C and D should be revisited. The use of quavers/eighth notes can be confusing for those familiar with musical notation, as they imply isochrony. If printed in black and white, colour distinctions may be lost, making it difficult to discern the different durations. A more universal representation, such as spectrograms, might be more effective. 

      Thanks for the suggestion. It’s true that the quavers/eighth notes might be confusing in that respect. However, we find this notation as a relatively standard approach to define paradigms in auditory neuroscience, see for instance the two papers below. In the new version of the manuscript, we specified in the captions under the figure that the notes refer to individual tones, in order to avoid ambiguities.

      - Wacongne, C., Labyt, E., Van Wassenhove, V., Bekinschtein, T., Naccache, L., & Dehaene, S. (2011). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proceedings of the National Academy of Sciences, 108(51), 20754-20759.

      - Dehaene, S., Meyniel, F., Wacongne, C., Wang, L., & Pallier, C. (2015). The neural representation of sequences: from transition probabilities to algebraic patterns and linguistic trees. Neuron, 88(1), 2-19.

      Figure 2 : In panel C of Figure 2, please include the exact p-value for the interaction observed. Refrain from using asterisks or "n.s." and opt for exact p-values throughout for the sake of clarity. 

      Thank you for your suggestion. We have included the exact p-value for the interaction in panel C of Figure 2. However, for the remaining figures, we have chosen to maintain the use of asterisks and "n.s.". We would like our pictures to convey the key findings concisely, while the numerical details can be found in the article text. The caption below the image also provides guidance on the interpretation of the p-values: (statistical significance: **p < 0.01, *p < 0.05, and ns p > 0.05).  

      Figure 3 Note typo "Omission reponse"

      Fixed. Thanks for noticing the typo. 

      A note: we moved the figure reflecting the main effect of long tone omission and the lack of main effect of language background (Figure 4 in the previous manuscript) in the supplementary material (Supplementary Figure 2).

      References

      Bendixen, A., SanMiguel, I., & Schröger, E. (2012). Early electrophysiological indicators for predictive processing in audition: a review. International Journal of Psychophysiology, 83(2), 120-131.

      Heilbron, M., & Chait, M. (2018). Great expectations: is there evidence for predictive coding in auditory cortex?. Neuroscience, 389, 54-73.

      Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. The Journal of the Acoustical Society of America, 124(4), 22632271.

      Molnar, M., Lallier, M., & Carreiras, M. (2014). The amount of language exposure determines nonlinguistic tone grouping biases in infants from a bilingual environment. Language Learning, 64(s2), 45-64.

      Molnar, M., Carreiras, M., & Gervain, J. (2016). Language dominance shapes non-linguistic rhythmic grouping in bilinguals. Cognition, 152, 150-159.

      Ross, J. M., & Hamm, J. P. (2020). Cortical microcircuit mechanisms of mismatch negativity and its underlying subcomponents. Frontiers in Neural Circuits, 14, 13.

      Simon, J., Balla, V., & Winkler, I. (2019). Temporal boundary of auditory event formation: An electrophysiological marker. International Journal of Psychophysiology, 140, 53-61.

      Studenova, A. A., Forster, C., Engemann, D. A., Hensch, T., Sander, C., Mauche, N., ... & Nikulin, V. V. (2023). Event-related modulation of alpha rhythm explains the auditory P300 evoked response in EEG. bioRxiv, 2023-02.

      Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition, 115(2), 356-361.

      Zhang, Y., Yan, F., Wang, L., Wang, Y., Wang, C., Wang, Q., & Huang, L. (2018). Cortical areas associated with mismatch negativity: A connectivity study using propofol anesthesia. Frontiers in Human Neuroscience, 12, 392.

      Ladinig, O., Honing, H., Háden, G., & Winkler, I. (2009). Probing attentive and preattentive emergent meter in adult listeners without extensive music training. Music Perception, 26(4), 377-386. 

      Brochard, R., Abecasis, D., Potter, D., Ragot, R., & Drake, C. (2003). The “ticktock” of our internal clock: Direct brain evidence of subjective accents in isochronous sequences. Psychological Science, 14(4), 362-366.

      Potter, D. D., Fenwick, M., Abecasis, D., & Brochard, R. (2009). Perceiving rhythm where none exists: Event-related potential (ERP) correlates of subjective accenting. Cortex, 45(1), 103-109.

      Bouwer, F. L., Werner, C. M., Knetemann, M., & Honing, H. (2016). Disentangling beat perception from sequential learning and examining the influence of attention and musical abilities on ERP responses to rhythm. Neuropsychologia, 85, 80-90.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In Drosophila melanogaster, expression of Sex-lethal (Sxl) protein determines sexual identity and drives female development. Functional Sxl protein is absent from males where splicing includes a termination codon-containing "poison" exon. Early during development, in the soma of female individuals, Sxl expression is initiated by an X chromosome counting mechanism that activates the Sxl establishment promoter (SxlPE) to produce an initial amount of Sxl protein. This then suppresses the inclusion of the "poison" exon, directing the constructive splicing of Sxl transcripts emerging from the Sxl maintenance promotor (SxlPM) which is activated at a later stage during development irrespective of sex. This autoregulatory loop maintains Sxl expression and commits to female development. 

      Sxl also determines the sexual identity of the germline. Here Sxl expression generally follows the same principles as in somatic tissues, but the way expression is initiated differs from the soma. This regulation has so far remained elusive. 

      In the presented manuscript, Goyal et al. show that activation of Sxl expression in the germline depends on additional regulatory DNA sequences, or sequences different from the ones driving initial Sxl expression in the soma. They further demonstrate that sisterless A (sisA), a transcription factor that is required for activation of Sxl expression in the soma, is also necessary, but not sufficient, to initiate the expression of functional Sxl protein in female germ cells. sisA expression precedes Sxl induction in the germline and its ablation by RNAi results in impaired expression of Sxl, formation of ovarian tumors, and germline loss, phenocopying the loss of Sxl. Intriguingly, this phenotype can be rescued by the forced expression of Sxl, demonstrating that the primary function of sisA in the germline is the induction of Sxl expression. 

      Strengths: 

      The clever design of probes (for RNA FISH) and reporters allowed the authors to dissect Sxl expression from different promoters to get novel insight into sex-specific gene regulation in the germline. All experiments are carefully controlled. Since Sxl regulation differs between the soma and the germline, somatic tissues provide elegant internal controls in many experiments, ensuring e.g. functionality of the reporters. Similarly, animals carrying newly generated alleles (e.g. genomic tagging of the Sxl locus) are fertile and viable, demonstrating that the genetic manipulation does not interfere with protein function. The conclusions drawn from the experimental data are sound and advance our understanding of how Sxl expression is induced in the female germline. 

      Weaknesses: 

      The assays employed by the authors provide valuable information on when Sxl promoters become active. However, since no information on the stability of the gene products (i.e. RNA and protein) is available, it remains unclear when the SxlPE promoter is switched off in the germline (conceptually it only needs to be active for a short time period to initiate production of functional Sxl protein). As correctly stated by the authors, the persisting signals observed in the germline might therefore not reflect the continuous activity of the SxlPE promoter. 

      Mapping of regulatory elements and their function: SxlPE with 1.5 kb of flanking upstream sequence is sufficient to recapitulate early Sxl expression in the soma. The authors now provide evidence that beyond that, additional DNA sequences flanking the SxlPE promoter are required for germline expression. However, a more precise mapping was not performed. Also, due to technical limitations, the authors could not precisely map the sisA binding sites. Since this protein is also involved in the somatic induction of Sxl, its binding sites likely reside in the region 1.5kb upstream of the SxlPE promoter, which has been reported to be sufficient for somatic regulation. The regulatory role of the sequences beyond SxlPE-1.5kb therefore remains unaddressed and it remains to be investigated which trans-acting factor(s) exert(s) its/their function(s) via this region. 

      We agree that a more precise mapping of the essential elements within the 10.2 kb reporter is an important direction in which to proceed. Unfortunately, this is out of the scope of the current manuscript given current lab personnel. In regard to the 1.5 kb promoter that activates SxlPE in the soma, we do not feel that the Sisa binding sites are necessarily in this region. It is important to note that, while the 1.5 kb promoter is sufficient for female-specific expression in the soma, it may not contain all of the regulatory elements that normally regulate PE from the endogenous locus. Activation of PE in the soma is thought to be regulated by a combination of positive-acting factors (SisA, SisB, etc.) and repressive factors (e.g. Dpn) that set a threshold for PE activation. Much more work would need to be done to determine whether all of these factors bind to the 1.5 kb promoter, or whether additional sequences are also involved to control the proper timing and robustness of normal Sxl PE activation in the soma.

      The central question of how Sxl expression is initiated and controlled in the germline still remains unanswered. Since sisA is zygotically expressed in both the male and the female germline (Figure 4D), it is unlikely the factor that restricts Sxl expression to the female germline. 

      X chromosome “counting” elements like SisA are always expressed in both males and females, but it is thought that the 2X does of them in females activates PE, while the 1X does in males does not. Thus, we do expect SisA to be expressed in both males and females as we observed.

      How does weak expression of Sxl in male tissues or expression above background after knockdown of sisA reconcile with the model that an autoregulatory feedback loop enforces constant and clonally inheritable Sxl expression once Sxl is induced? Is the current model for Sxl expression too simple or are we missing additional factors that modulate Sxl expression (such as e.g. Sister of Sex-lethal)? While I do not expect the authors to answer these questions, I would expect them to appropriately address these intriguing aspects in the discussion. 

      It is difficult to know what is “background” and what is actual weak Sxl expression in males. We agree that, if it is real, then why it doesn’t activate autoregulation of the Sxl PM transcript is mysterious. And yes, the current model for female-specific expression of Sxl in the soma may well be incomplete. Sxl PM transcript is present in the testis based on community RNA-seq data and our own analysis of male vs. female bam-mutant gonads (PMID 31329582), but it is at lower levels. Whether the lower level in the testis is due to tissue differences or sex-specific regulation of RNA levels is unknown. Our observations that the HA-tagged Sxl Early protein remains present in somatic cells in L1 larvae, and that GFP expression from the 10.2 kb Sxl PE-GFP can be detected in the soma until L2 could either be due to perdurance of the protein products, or continued sex-specific expression of PE long after the time that it was thought to shut off. This is also long after dosage compensation should have equalized the expression of X chromosome gene expression, meaning that X chromosomes can no longer be “counted” by factors like SisA and SisB. Thus, sex-specific expression of PE at this time would require another mechanism besides the current model (such as feedback regulation of Sxl PE transcription from downstream factors).

      Reviewer #2 (Public Review): 

      Summary: 

      The authors wanted to determine whether cis-acting factors of Sxl - two different Sxl promoters in somatic cells - regulate Sxl in a similar way in germ cells. They also wanted to determine whether trans-acting factors known to regulate Sxl in the soma also regulate Sxl in the germline. 

      Regarding the cis-acting factors, they examine the Sxl "establishment promoter" (SxlPE) that is activated in female somatic cells by the presence of two X chromosomes. Slightly later in development, dosage compensation equalizes X chromosome expression in males and females and so X chromosomes can no longer be counted. The second Sxl promoter is the "maintenance promoter," (SxlPM), which is activated in both sexes. The mRNA produced from the maintenance promoter has to be alternatively splicing from early Sxl protein generated earlier in development by the PE. This leads to an autoregulatory loop that maintains Sxl expression in female somatic cells. The authors used fluorescent in situ hybridization (FISH) with oligopaints to determine the temporal activation of the PE or PM promoters. They find that - unlike the soma - the PE does not precede the PM and instead is activated contemporaneously or later than the PM - this is confusing with the later results (see below). Next, they generated transcriptional reporter constructs containing large segments of the Sxl locus, the 1.5 kb used in somatic studies, a 5.2 kb reporter, and a 10.2 kb. Interestingly the 1.5 kb reporter that was reported to recapitulate Sxl expression in soma and germline was not observed by the authors. The 5.2 kb reporter was observed in female somatic cells but not in germ cells. Only when they include an additional 5 kb downstream of the 5.2 kb reporter (here the 10.2 kb reporter) they did see expression in germ cells but this occurred at the L1 stages. Their data indicate that Sxl activity in the germ requires different cis-regulation than the soma and that the PE is activated later in germ cells than in somatic cells. The authors next use gene editing to insert epitope tags in two distinct strains in the hopes of creating an early Sxl and a later Sxl protein derived from the PE and PM, respectively. The HA-tagged protein from the PE was seen in somatic cells but never in the germline, possibly due to very low expression. The FLAG-tagged late Sxl protein is observed in L2 germ cells. Because the early HA-Sxl protein is not perceptible in germ cells, it is not possible to conclude its role in the germline. However, because late FLAG-Sxl was only observed in L2 germ cells and the PE was detected in L1, this leaves open the possibility that PE produces early HA-Sxl (which currently cannot be detected), which then alternatively splices the transcript from the PM. In other words, the soma and germline could have a similar temporal relationship between the two Sxl promoters. While I agree with the authors about this conclusion, the earlier work with the oligopaints leads to the conclusion that SE is active after PM. This is confusing. 

      The temporal relationship between Sxl PE and Sxl PM in the germline is indeed confusing. One source of confusion comes from whether one is discussing Sxl protein production or promoter activity. As the reviewer nicely summarizes, our transcription analysis with oligopaints indicates that, unlike in the soma, Sxl PE is NOT on in the germline prior to PM. Our other data indicate that PE is instead likely only active well after transcription from PM has begun. However, this still means that the temporal order of the EARLY and LATE Sxl proteins can be the same as the soma. Even if PM is active well before PE in the germline, the PE transcript cannot produce any functional protein in the absence of being alternatively spliced by the Sxl protein (Sxl autoregulation). Thus, even if PM is active before PE in the germline, we would not expect to observe any LATE Sxl protein until the PE promoter comes on, and produces a pulse of EARLY Sxl protein. The fact that we observe LATE Sxl protein at L2 is consistent with our observation that the 10.2 kb Sxl PE reporter is active at L1. We will attempt to explain all of this better in a revised manuscript.

      Next, the authors wanted to turn their attention to the trans-acting factors that regulate Sxl in the soma, including Sisterless A (SisA), SisB, Runt, and the JAK/STAT ligand Unpaired. Using germline RNAi, the authors found that only knockdown of SisA causes ovarian tumors, similar to the loss of Sxl, suggesting that SisA regulates Sxl (ie the PE) in both the soma and the germline. They generated a SisA null allele using CRISPR/Cas9 and these animals had ovarian tumors and germ cell-less ovaries. FISH revealed that sisA is activated in primordial germ cells in stages 3-6 before the activation of Sxl. They used CRISPR-Cas9 to generate an endogenously-tagged SisA and found that tagged SisA was expressed in stage 3-6 PCGs, which is consistent with activating PE in the germline. They showed that sisA is upstream of Sxl as germline depletion of sisA led to a significant decrease in expression from the 10.2 kb PE reporter and in SXL protein. The authors could rescue the ovarian tumors and loss of Sxl protein upon germline depletion of sisA by supplying Sxl from another protein (the otu promoter). These data indicate that sisA is necessary for Sxl activation in the germline. However, ectopic sisA in germ cells in the testis did not lead to ectopic Sxl, suggesting that sisA is not sufficient to activate Sxl in the germline. 

      Strengths: 

      (1) The genetic and genomic approaches in this study are top-notch and they have generated reagents that will be very useful for the field. 

      (2) Excellent use of powerful approaches (oligo paint, reporter constructs, CRISPR-Cas9 alleles). 

      (3) The combination of state of art approaches and quantification of phenotypes allows the authors to make important conclusions. 

      Weaknesses: 

      (1) Confusion in line 127 (this indicates that SxlPE is not activated before SxlPM in the germline) about PE not being activated before the PM in the germline when later figures show that PE is activated in L1 and late Sxl protein is seen in L2. It would be helpful to the readers if the authors edited the text to avoid this confusion. Perhaps more explanation of the results at specific points would be helpful. 

      We agree--see response above.

      Reviewer #3 (Public Review): 

      Summary: 

      The mechanisms governing the initial female-specific activation of Sex-lethal (Sxl) in the soma, the subsequent maintenance of female-specific expression and the various functions of Sxl in somatic sex determination and dosage compensation are well documented. While Sxl is also expressed in the female germline where it plays a critical role during oogenesis, the pathway that is responsible for turning Sxl on in germ cells has been a long-standing mystery. This manuscript from Goyal et al describes studies aimed at elucidating the mechanism(s) for the sex-specific activation of the Sex-lethal (Sxl) gene in the female germline of Drosophila. 

      In the soma, the Sxl establishment promoter, Sxl-Pe, is regulated in pre-cellular blastoderm embryos in somatic cells by several X-linked transcription factors (sis-a, sis-b, sis-c and runt). At this stage of development, the expression of these transcription factors is proportional to gene dose, 2x females and 1x in males. The cumulative two-fold difference in the expression of these transcription factors is sufficient to turn Sxl-Pe on in female embryos. Transcripts from the Sxl-Pe promoter encode an "early" version of the female Sxl protein, and they function to activate a splicing positive autoregulatory loop by promoting the female-specific splicing of the initial pre-mRNAs derived from the Sxl maintenance promoter, Sxl-Pm (which is located upstream of Sxl-Pm). These female Sxl-Pm mRNAs encode a Sxl protein with a different N-terminus from the Sxl-Pe mRNAs, and they function to maintain female-specific splicing in the soma during the remainder of development. 

      In this manuscript, the authors are trying to understand how the Sxl-Pm positive autoregulatory loop is established in germ cells. If Sxl-Pe is used and its activation precedes Sxl-Pm as is true in the soma, they should be able to detect Sxl-Pe transcripts in germ cells before Sxl-Pm transcripts appear. To test this possibility, they generated RNA FISH probes complementary to the Sxl-Pe first exon (which is part of an intron sequence in the Sxl-Pm transcript) and to a "common sequence" that labels both Sxl-Pe and Sxl-Pm transcripts. Transcripts labeled by both probes were detected in germ cells beginning at stage 5 (and reaching a peak at stage 10), so either the Sxl-Pm and Sxl-Pe promoters turn on simultaneously, or Sxl-Pe is not active. 

      They next switched to Sxl-Pe reporters. The first Sxl-Pe:gfp reporter they used has a 1.5 kb upstream region which in other studies was found to be sufficient to drive sex-specific expression in the soma of blastoderm embryos. Also like the endogenous Sxl gene it is not expressed in germ cells at this early stage. In 2011, Hashiyama et al reported that this 1.5 kb promoter fragment was able to drive gfp expression in Vasa-positive germ cells later in development in stage 9/10 embryos. However, because of the high background of gfp in the nearby soma, their result wasn't especially convincing. Though they don't show the data, Goyal et al indicated that unlike Hashiyama et al they were unable to detect gfp expressed from this reporter in germ cells. Goyal et al extended the upstream sequences in the reporter to 5 kb, but they were still unable to detect germline expression of gfp. 

      Goyal et al then generated a more complicated reporter which extends 5 kb upstream of the Sxl-Pe start site and 5 kb downstream-ending at or near 4th exon of the Sxl-Pm transcript (the Sxl-Pe10 kb reporter). (The authors were not explicit as to whether the 5 kb downstream sequence extended beyond the 4th exon splice junction-in which case splicing could potentially occur with an upstream exon(s)-or terminated prior to the splice junction as seems to be indicated in their diagram.) With this reporter, they were able to detect sex-specific gfp expression in the germline beginning in L1 (first instar larva). With the caveat that gfp detection might be delayed compared to the onset of reporter activation, these findings indicated that the sequences in the reporter are able to drive sex-specific transcription in the germline at least as early as L1. 

      The authors next tagged the N-terminal end of the Sxl-Pe protein with HA (using Crispr/Cas9) and the N-terminal end of Sxl-Pm protein with Flag. They report that the HA-Sxl-Pe protein is first detected in the soma at stage 9 of embryogenesis. Somatic HA-Sxl-Pe protein persists into L1, but is no longer detected in L2. However, while somatic HA-Sxl-Pe protein is detected, they were unable to detect HA-Sxl-Pe protein in germ cells. In the case of FLAG-Sxl-Pm, it could first be detected in L2 germ cells indicating that at this juncture the Sxl-positive autoregulatory loop has been activated. This contrasts with Sxl-Pm transcripts which are observed in a few germ cells at stage 5 of embryogenesis, and in most germ cells by stage 10. The authors propose (based on the expression pattern of the Sxl-Pe10kb reporter and the appearance of Flag-Sxl-Pm protein) that Sxl-Pe comes on in germ cells in L1, and that the Sxl-Pe protein activates the female splicing of Sxl-Pm transcripts, giving detectable Flag-Sxl-Pm proteins beginning in L2. 

      To investigate the signals that activate Sxl-Pe in germ cells, the authors tested four of the X-linked genes (sis-a, sis-b, sis-c, and runt) that function to activate Sxl-Pe in the soma in early embryos. RNAi knockdown of sis-b, sis-c, and runt had no apparent effect on oogenesis. In contrast, knockdown of sis-a resulted in tumorous ovaries, a phenotype associated with Sxl mutations. (Three different RNAi transgenes were tested-two gave this phenotype, the third did not.) Sxl-Pe10kb reporter activity in L1 female germ cells is also dependent on sis-A. 

      Several approaches were used to confirm a role for sis-a in a) oogenesis and b) the activation of the Sxl-Pm autoregulatory loop. They showed that sis-a germline clones (using tissue-specific Crispr/Cas9 editing) resulted in the tumorous ovary phenotype and reduced the expression of Sxl protein in these ovaries. They found that sis-a transcripts and GFP-tagged Sis-A protein are present in germ cells. Finally, they showed tumorous ovary phenotype induced by germline RNAi knockdown of sis-a can be partially rescued by expressing Sxl in the germ cells. 

      Critique: 

      While this manuscript addresses a longstanding puzzle - the mechanism activating the Sxl autoregulatory loop in female germ cells-and likely identified an important germline transcriptional activator of Sxl, sis-a, the data that they've generated doesn't make a compelling story. At every step, there are puzzle pieces that don't fit the narrative. In addition, some of their findings are inconsistent with many previous studies. 

      We respect and appreciate this reviewer for the detailed comments. However, we feel that the claim that our work doesn’t “make a compelling story” and that many “pieces…don’t fit the narrative” is incorrect. The main issue that this reviewer raises is that we do not know if Sxl “early” transcription in the germline initiates from the Pe promoter. This is true, which we fully acknowledge, but the detail of whether “germline early” transcription of Sxl initiates from Pe or from other, as yet undefined, germline promoter does not affect the main conclusions of the paper. These conclusions are that a) regulation of Sxl in the germline is fundamentally different from in the soma and 2) despite point (1), sisA acts as an activator of Sxl in both the soma and the germline. Neither of these main points is disputed by this reviewer.

      (1) The authors used RNA FISH to time the expression of Sxl-Pe and Sxl-Pm transcripts in germ cells. Transcripts complementary to Sxl-Pe and Sxl-Pm were detected at the same time in embryos beginning at stage 5. This is not a definitive experiment as it could mean a) that Sxl-Pe and Sxl-Pm turn on at the same time, b) that Sxl-Pe comes on after Sxl-Pm (as suggested by the Sxl-Pe10kb reporter) or c) Sxl-Pe never comes on. 

      When designing this experiment, we wanted to test whether the “soma model” of Pe activation before Pm was also true in the germ cells. Our data clearly demonstrate that transcripts beginning downstream of Pe are not expressed prior to transcripts beginning downstream of Pm. Thus, we can state that the “soma model” of Pe first and then Pm does not occur in the germline, which is very interesting. However, we cannot make any other conclusions about Pe in the germline from these data, as the reviewer indicates.

      (2) Hashiyama et al reported that they detected gfp expression in stage 9/10 germ cells from a 1.5 kb Sxl-Pe-gfp. As noted above, this result wasn't entirely convincing and thus it isn't surprising that Goyal et al were unable to reproduce it. Extending the upstream sequences to just before the 1st exon of Sxl-Pm transcripts also didn't give gfp expression in germ cells. Only when they added 5 kb downstream did they detect gfp expression. However, from this result, it isn't possible to conclude that the Sxl-Pe promoter is actually driving gfp expression in L1 germ cells. Instead, the Sxl promoter active in the germ line could be anywhere in their 10 kb reporter. 

      We agree that we have not determined the transcriptional start sites for Sxl in the germline and it is possible that the 10.2 kb reporter uses a different promoter than Pe, as long as that transcript can also be spliced into exon 4 where the GFP tag has been placed. The three types of experiments conducted—FISH to regions of the nascent transcripts, tagged versions of the different predicted ORFs, and promoter-GFP constructs—are extensive, but all have different limitations. Indeed, it would be challenging to determine the transcription start sites in the germline, as it would require obtaining enough L1 larvae to be able to dissociate the animals, or isolated gonads, into single cells in order to FACS purify the germ cells for RACE or long-read sequencing (I’m not sure that L1 larval single-nucleus seq would be enough for calling start sites). Otherwise, there would be no way to determine if expected or unexpected transcripts came from the soma or the germline. We can consider these experiments in the future.

      Fortunately, the main conclusions from this paper do not require knowing whether the germline uses Pe or some other “germline early” promoter that can produce Sxl protein in the absence of autoregulation by existing Sxl protein. The observations that a nascent transcript including the region downstream of Pm is observed in embryonic germ cells, but that the tagged LATE protein is not observed until L2, suggest that the transcript produced in early germ cells cannot produce a functional protein. This is consistent with the need for Sxl autoregulation of the Pm transcript in the germline as in the soma, as was previously thought. This is further supported by the observations that activity of the 10.2 kb reporter is only observed in L1 germ cells, and that the LATE Sxl protein is only observed in germ cells after this point. Thus, we can conclude that either Pe, or another “germline early” promoter, acts to produce female-specific Sxl protein to initiate autoregulation of Sxl splicing and protein production in the germline. We feel that this is a significant advance for the field, and we will make it more clear in the text that the initial expression of Sxl in the germline may not be from the Pe promoter.

      Other conclusions of the manuscript are unaffected by the start site for “germline early” Sxl transcription, including that the germline activates Sxl protein expression much later than the soma, which calls into question previous work indicating an early role for Sxl in the germline. Also unaffected is our conclusion that different enhancer sequences are required for activation of Sxl expression in the germline than in the soma, consistent with previous work demonstrating that the genetics of Sxl activation in the germline are different than in the soma. Lastly, our conclusions that sisA acts upstream of Sxl, and is required for Sxl germline expression, either directly or indirectly, are also unaffected by the nature of the Sxl “germline early” start site.

      (3) At least one experiment suggests that Sxl-Pe never comes on in germ cells. The authors tagged the N-terminus of the Sxl-Pe protein with HA and the N-terminus of the Sxl-Pm protein with Flag. Though they could detect HA-Sxl-Pe protein in the soma, they didn't detect it in germ cells. On the other hand, the Flag-Sxl-Pm protein was detected in L2 germ cells (but not earlier). These results would more or less fit with those obtained for the 10 kb reporter and would support the following model: Prior to L1, Sxl-Pm transcripts are expressed and spliced in the male pattern in both male and female germ cells. During L1, Sxl protein expressed via a mechanism that depends upon a 10 kb region spanning Sxl-Pe (but not on Sxl-Pe) is produced and by L2 there are sufficient amounts of this protein to switch the splicing of Sxl-Pm transcripts from a male to a female pattern-generating Flag-tagged Sxl-Pm protein. 

      As described above, it is indeed possible that another promoter besides Pe is active as the “germline early” promoter. We will make this more clear in a revised version, but the major conclusions of the manuscript are unaffected.

      (4) The 10kb reporter is sex-specific, but not germline-specific. The levels of gfp in female L1 somatic cells are equal to if not greater than those in L1 female germ cells. That the Sxl-Pe10kb reporter is active in the soma complicates the conclusion that it represents a germ line-specific promoter. Germline activity is, however, sensitive to sis-A knockdowns which is plus. Presumably, somatic expression of the reporter wouldn't be sensitive to a (late) sis-A knockdown- but this wasn't shown. 

      We are confused by this comment because we do not conclude that the Pe is a germline-specific promoter. Pe is known to be expressed in the soma, from considerable previous work cited by this reviewer, and the simplest model is that Pe is used in both the soma and the germline, as reflected by our 10.2 kb reporter. It is actually quite interesting how late this promoter seems active in the soma, contrary to current dogma, but we did not study somatic activation of Sxl in this work.

      (5) Their results with the HA-Sxl-Pe protein don't fit with many previous studies-assuming that the authors have explained their results properly. They report that HA-Sxl-Pe protein is first detected in the soma at stage 9 of embryogenesis and that it then persists till L2. However, previous studies have shown that Sxl-Pe transcripts and then Sxl-Pe proteins are first detected in ~NC11-NC12 embryos. In RNase protection experiments, the Sxl-Pe exon is observed in 2-4 hr embryos, but not detected in 5-8 hr, 14-12 hr, L1, L2, L3, or pupae. Northerns give pretty much the same picture. Western blots also show that Sxl-Pe proteins are first detectable around the blastoderm stage. So it is not at all clear why HA-Sxl-Pe proteins are first observed at stage 9 which, of course, is well after the time that the Sxl-Pm autoregulatory loop is established. 

      Given the obvious problems with the initial timing of somatic expression described here, it is hard to know what to make of the fact that HA-tagged Sxl-Pe proteins aren't observed in germ cells. 

      As for the presence of HA-Sxl-Pe proteins later than expected: While RNase protection/Northern experiments showed that Sxl-Pe mRNAs are expressed in 2-4 hr embryos and disappear thereafter, one could argue from the published Western experiments that the Sxl-PE proteins expressed at the blastoderm stage persist at least until the end embryogenesis, though perhaps at somewhat lower levels than at earlier points in development. So the fact that Goyal et al were able to detect HA-Sxl-Pe proteins in stage 9 embryos and later on in L1 larva probably isn't completely unexpected. What is unexpected is that the HA-Sxl-Pe proteins weren't present earlier. 

      We thank the reviewer for this detailed analysis. Since we were not focused on somatic expression of Sxl in this work, it is possible that stage 9 was the earliest stage we observed in our experiments, rather than the earliest stage in which it is ever observed. We will repeat these experiments to verify when the HA-tagged early Sxl protein is first observed. However, these comments have no bearing on our conclusions about Sxl expression in the germline, which is the focus of this manuscript.

      (6) The authors use RNAi and germline clones to demonstrate that sis-A is required for proper oogenesis: when sis-A activity is compromised in germ cells, i) tumorous ovary phenotypes are observed and ii) there is a reduction in the expression of Sxl-Pm protein. They are also able to rescue the phenotypic effects of sis-a knockdown by expressing a Sxl-Pm protein. While the experiments indicating sis-a is important for normal oogenesis and that at least one of its functions is to ensure that sufficient Sxl is present in the germline stem cells seem convincing, other findings would make the reader wonder whether Sis-A is actually functioning (directly) to activate Sxl transcription from promoter X. 

      It is true that we do not know the binding specificity for SisA, which is why we have made no claims about the directness of SisA regulation of Sxl. This does not change our conclusions that sisA is upstream of Sxl activation, since loss of sisA function has a similar phenotype to loss of Sxl, loss of sisA blocks Sxl protein expression, and expression of Sxl rescues the sisA mutant phenotype.

      The authors show that sis-a mRNAs and proteins are expressed in stage 3-5 germ cells (PGCs). This is not unexpected as the X-linked transcription factors that turn Sxl-Pe on are expressed prior to nuclear migration, so their protein products should be present in early PGCs. The available evidence suggests that their transcription is shut down in PGCs by the factors responsible for transcriptional quiescence (e.g., nos and pgc) in which case transcripts might be detected in only one or two PGC-which fits with their images. However, it is hard to believe that expression of Sis-A protein in pre-blastoderm embryos is relevant to the observed activation of the Sxl-Pm autoregulatory loop hours later in L2 larva. 

      It is also not clear how the very low level of gfp-Sis-A seen in only a small subset of migrating germ cells in stage 10 embryos (Figure S6) would be responsible for activating the Sxl-Pe10kb reporter in L1. It seems likely that the small amount of protein seen in stage 10 embryos is left over from the pre-cellular blastoderm stage. In this case, it would not be surprising to discover that the residual protein is present in both female and male stage 10 germ cells. This would raise further doubts about the relevance of the gfp-Sis-A at these early stages. 

      In fact, given the evidence presented implicating sis-a in activating Sxl, (the germline activation of the Sxl-Pe10kb reporter, the RNAi knockdowns, and the germ cell-specific sis-a clones) it is clear that the sis-A RNAs and proteins seen in pre-cellular blastoderm PGCs aren't relevant. The germline clone experiment (and also the RNAi knockdowns) indicates that sis-A must be transcribed in germ cells after Cas9 editing has taken place. Presumably, this would be after transcription is reactivated in the germline (~stage 10) and after the formation of the embryonic gonad (stage 14) so that the somatic gonadal cells can signal to the germ cells. With respect to the reporter, the relevant time frame for showing that sis-A is present in germ cells would be even later in L1. 

      The reviewer is correct in wondering how early sisA transcription can affect late Sxl activation, and we are clear about this conundrum in our manuscript. However, they are incorrect about the early sisA expression. Our experiments examining nascent sisA transcripts indicate that sisA is zygotically expressed in the formed germ cells rather than being leftover from expression in early nuclei. The fact that only a portion of germ cells express sisA at any time may well be due to a timing issue, where not all germ cells express sisA at the same time. They are also incorrect about the timing of Cas9 editing in the germline—the guide RNAs are expressed from a general promoter that is active both maternally and in the early embryo, and the Cas9 RNA from the nos promoter is deposited in the germ plasm where it is translated long before cellularization, meaning that sisA CRISPR knockout can begin at the earliest stages of germ cell formation or before.

      (7) As noted above, the data in this manuscript do not support the idea that Sxl-Pe proteins activate the Sxl-Pm female splicing in the germline. Flybase indicates that there is at least one other Sxl promoter that could potentially generate a transcript that includes the male exon but still could encode a Sxl protein. This promoter "Sxl-Px" is located downstream of Sxl-Pm and from its position it could have been included in the authors' 10 kb reporter. The reported splicing pattern of the endogenous transcript skips exon2, and instead links an exon just downstream of Sxl-Px to the male exon. The male exon is then spliced to exon4. If the translation doesn't start and end at one of the small upstream orfs in the exons close to Sxl-Px and the male exon, a translation could begin with an AUG codon in exon4 that is in frame with the Sxl protein coding sequence. This would produce a Sxl protein that lacks aa sequences from N-terminus, but still retains some function. 

      Another possible explanation for how gfp is expressed from the 10 kb reporter is that the transcript includes the "z" exon described by Cline et al., 2010.

      As discussed above, the exact location of the start site for the Sxl transcript in the germline remains to be determined, but does not affect the main conclusions of the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      First, all the experiments are performed in Jurkat T cells that may not recapitulate the regulation of polarization in primary T cells.

      To extend our results in Jurkat cells forming IS to primary cells, we have now performed experiments using synapses established by Raji cells and either primary T cells  (TCRmediated) or primary CAR T cells (CAR-mediated) (new Suppl. Fig. S7). These experiments clearly show the presence of FMNL1 at these two different IS classes (new Suppl. Fig. S7), similar to what was found in Jurkat-Raji synapses. In addition, since most of the experiments were performed in Jurkat cells, we have changed the title of our manuscript, to be faithful to the main body of our results. New sentences dealing with this important issue have been included in the Results and Discussion sections.

      Moreover, all the experiments analyzing the role of PKCdelta are performed in one clone of wt or PKCdelta KO Jurkat cells. This is problematic since clonal variation has been reported in Jurkat T cells.

      Referee is right, this is the reason why we have studied three different control clones (C3, C9, C7) and three PKCdelta-interfered clones (P5, P6 and S4) all derived from JE6.1 clone and the results have been previously published (Herranz et al 2019)(Bello-Gamboa et al 2020). All these clones expressed similar levels of the relevant cell surface molecules and formed synaptic conjugates with similar efficiency (Herranz et al 2019). The P5, P6 and S4 clones exhibited a similar defect in MVB/MTOC polarization when compared with the control clones (Herranz et al 2019)(Bello-Gamboa et al 2020). Experiments developed by other researchers using a different clone of Jurkat (JE6.1) and primary CD4+ and CD8+ lymphocytes interfered in FMNL1 (Gomez et al. 2007), showed a comparable defect in MTOC polarization to that found in our control clones when were transiently interfered in FMNL1 (Bello-Gamboa et al 2020, this manuscript). In this manuscript we have studied, instead of canonical JE6.1 clone, C3 and C9 control clones derived from JE6.1, since the puromycin-resistant control clones (containing a scramble shRNA) were isolated by limiting dilution together with the PKCdelta-interfered clones (Herranz et al. 2019), thus C3 and C9 clones are the best possible controls to compare with P5 and P6 clones. Please realize that microsatellite analyses, available upon request, supports the identity of our C3 clone with JE6.1. Moreover, when GFP-PKCdelta was transiently expressed in the three PKCdelta-interfered clones, MTOC/MVB polarization was recovered to control levels (Herranz et al. 2019). Therefore, the deficient MTOC/MVB polarization in all these clones is exclusively due to the reduction in PKCdelta expression (Herranz et al 2019), and thus clonal variation cannot underlie our results in stable clones. We have now included new sentences to address this important point and to mention the inability of FMNL1betaS1086D to revert the deficient MTOC polarization occurring in P6 PKCdelta-interfered clone, as occurred in P5 clone. Due to the fact we have now included more figures and panels to satisfy editor and referees’s comments, we have not included the dot plot data corresponding to C9 and P6 clones to avoid a too long and repetitive manuscript. Since all the FMNL1 interference and FMNL1 variants reexpression experiments were performed in transient assays (2-4 days after transfection), there was no chance for any clonal variation in these short-time experiments. Moreover, internal controls using untransfected cells or Raji cells unpulsed with SEE were carried out in all these transient experiments.

      Finally, although convincing, the defect in the secretion of vesicles by T cells lacking phosphorylation of FMNL1beta on S1086 is preliminary. It would be interesting to analyze more precisely this defect. The expression of the CD63‑GFP in mutants by WB is not completely convincing. Are other markers of extracellular vesicles affected, e.g. CD3 positive?

      We acknowledge this comment. It is true that the mentioned results do not directly demonstrate the presence of exosomes at the synaptic cleft of the synapses, since the nanovesicles were harvested from the cell culture supernatants from synaptic conjugates and these nanovesicles could be produced by multi‑directional degranulation of MVBs. To address this important issue, we have performed STED super‑resolution imaging of the immune synapses made by control and FMNL1-interfered cells. Nanosized (100-150 nm) CD63+ vesicles can be found in the synaptic cleft between APC and control cells with polarized MVBs, whereas we could not detect these vesicles in the synaptic cleft from FMNL1-interfered cells that maintain unpolarized MVBs (New Fig. 10). New sentences have been included in the Results and Discussion dealing with this important point. Regarding the use of CD3 as a marker of extracellular vesicles, please realize that CD3 is neither an enriched nor a specific marker of exosomes, since it is also present in plasma membrane shedding vesicles, molting vesicles from microvilli, apoptotic bodies and small cell fragments, apart from exosomes, thus we have preferred to use the canonic exosome marker CD63 as a general exosome reporter readout, for WB and immunofluorescence (MVBs, exosomes), time-lapse of MVBs (suppl. Video 8) and super resolution experiments (Fig. 10).   

      Reviewer #2 (Public Review):

      Summary:

      The authors have addressed the role of S1086 in the FMNL1beta DAD domain in 4 F-actin dynamics, MVB polarization, and exosome secretion, and investigated the potential implication of PKCdelta, which they had previously shown to regulate these processes, in FMNL1beta S1086 phosphorylation. This is based on:

      (1) the documented role of FMNL1 proteins in IS formation

      (2) their ability to regulate F-actin dynamics

      (3) the implication of PKCdelta in MVB polarization to the IS and FMNL1beta phosphorylation

      (4) the homology of the C-terminal DAD domain of FMNL1beta with FMNL2, where a phosphorylatable serine residue regulating its auto-inhibitory function had been previously identified. They demonstrate that FMNL1beta is indeed phosphorylated on S1086 in a PKCdelta-dependent manner and that S1086-phosphorylated FMNL1beta acts downstream of PKCdelta to regulate centrosome and MVB polarization to the IS and exosome release. They provide evidence that FMNL1beta accumulates at the IS where it promotes F-actin clearance from the IS center, thus allowing for MVB secretion.  

      Strengths

      The work is based on a solid rationale, which includes previous findings by the authors establishing a link between PKCdelta, FMNL1beta phosphorylation, synaptic F-actin clearance, and MVB polarization to the IS. The authors have thoroughly addressed the working hypotheses using robust tools. Among these, of particular value is an expression vector that allows for simultaneous RNAi-based knockdown of the endogenous protein of interest (here all FMNL1 isoforms) and expression of wild-‐‑type or mutated versions of the protein as YFP‐tagged proteins to facilitate imaging studies. The imaging analyses, which are the core of the manuscript, have been complemented by immunoblot and immunoprecipitation studies, as well as by the measurement of exosome release (using a transfected MVB/exosome reporter to discriminate exosomes secreted by T cells).

      Weaknesses

      The data on F-‐‑actin clearance in Jurkat T cells knocked down for FMNL1 and expressing wild-type FMNL1 or the non‑phosphorylatable or phosphomimetic mutants thereof would need to be further strengthened, as this is a key message of the manuscript. Also, the entire work has been carried out on Jurkat cells. Although this is an excellent model easily amenable to genetic manipulation and biochemical studies, the key finding should be validated on primary T cells

      Referee’s global assessment is right. To extend our results in Jurkat cells forming IS, we have now performed experiments using synapses established by Raji cells and either primary T cells (TCR-mediated) or primary CAR T cells (CAR-mediated) (new Suppl. Fig. S7). These experiments clearly show the presence of FMNL1 at these two different IS classes (new Suppl. Fig. S7), similar to what was found in Jurkat-Raji synapses. In addition, since most of the experiments were performed in Jurkat cells, we have changed the title of our manuscript, to be faithful to the main body of our results. New sentences have been included in Results and Discussion to address these important points.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study shows the role of the phosphorylation of FMNL1b on S1086 on the polarity of T lymphocytes in T lymphocytes, which is a new and interesting finding. It would be important to confirm some of the key results in primary T cells and to analyze in-depth the defect in actin remodeling (quantification of the images, analysis of some key actors of actin remodeling). The description of the defect in the secretion of extracellular vesicles would also benefit from a more accurate analysis of the content of vesicles. 

      Referee is right.  We have now performed experiments using synapses containing Raji cells and either primary T cells (TCR-mediated) or primary CAR T cells (CAR-mediated) (new Suppl. Fig. S7). These experiments clearly show the presence of FMNL1 at these two different IS classes, similar to what was found in Jurkat-‐‑Raji synapses. Moreover, since most of the experiments were performed in Jurkat cells, we have changed the title of our manuscript, to be faithful to the main body of our results. Regarding the use of CD63 instead of other markers such as for instance,  CD3 (as stated by the other referee), please realize that CD3 is neither an enriched nor a specific marker of exosomes, since it is also present in plasma membrane shedding vesicles, molting vesicles from microvilli, apoptotic bodies and small cell fragments, apart from exosomes, thus we have preferred to use the accepted consensus, canonic extracellular vesicle marker CD63 (International Society of Extracellular Vesicles positioning, Thery et al 2018, doi: 10.1080/20013078.2018.1535750. eCollection 2018., Alonso et al. 2011) as a general exosome reporter readout, for both WB, immunofluorescence (MVBs, exosomes) and super-resolution experiments. Accordingly, GFP-‐‑CD63 reporter plasmid was used for exosome secretion in transient expression studies and living cell time-lapse experiments (Suppl. Video 8). Any other exosome marker will also be present in Raji cells and will not allow to analyse exclusively the secretion of exosomes by the effector Jurkat cells, since B lymphocytes produce a large quantity of exosomes upon MHC‑II stimulation by Th lymphocytes (Calvo et al, 2020, doi:10.3390/ijms21072631). To reinforce the exosome data in the context of the immune synapse, STED super-resolution imaging of the immune synapses made by control and FMNL1‑interfered cells was performed. Nanosized (100-150 nm) CD63+ vesicles can be found in the synaptic cleft of control cells with polarized MVBs, whereas we could no detect these vesicles in the synaptic cleft from FMNL1-interfered cells that maintain unpolarized MVBs (new Fig. 10).

      Moreover, all the videos are not completely illustrative. For example, in video 2 it would be more appropriate show only the z plane corresponding to the IS to see more precisely the F-actin remodeling relative to CD63 labeling.

      Referee is right. It is true that the upper rows in some videos may distract the reader of the main message contained in the lower row, that includes the 90º turn-generated, zx plane corresponding to the IS interface. Accordingly, we have maintained the still images of the whole synaptic conjugates in the first row from video 2; this will allow the reader to perceive a general view of the fluorochromes on the whole cell conjugates, as a reference, and to compare precisely the F-actin remodeling relative to CD63 labeling only at the zx interface (lower row). We have now processed the videos 1 and 5 following similar criteria

      The quality of videos 3 and 4 are not good enough. For video 7, it seems that the labeling of phospho-‐‑Ser is very broad at the IS, which is expected since it should label all the proteins that are phosphorylated by PKCs. The resolution of microscopy (at the best 200 to 300 nm) does not allow us to conclude on the co-‐localization of FMNL1b with phospho-‐‑Ser and is thus not conclusive. Finally, the study would benefit from a more careful statistical analysis. The dot plots showing polarity are presented for one experiment. Yet, the distribution of the polarity is broad. Results of the 3 independent experiments should be shown and a statistical analysis performed on the independent experiments

      Referee is right, we have amended video settings (brightness/contrast) in videos 3 and 4 to improve this issue. In addition, we would like to remark that the translocation of proteins to cellular substructures in living cells is not a trivial issue, since certain protein localizations are too dynamic to be properly imaged with enough spatial resolution. The equilibrium resulting from the association/dissociation of a certain protein to the membrane, in addition to the protein diffusion naturally occurring in living cells, as well as signal intensity fluctuations inherent to the stochastic nature of fluorescence emission often provide barriers for image quality (Shroff et al, 2024). Thus, additional image blurring is expected when compared with that observed in fixed samples. However, we think it is important to provide the potential readers with a dynamic view of FMNL1 localization, which can only be achieved through real-time videos, in addition to the still frames from the same videos provided in Fig. 6A (the referee did not argue against the inclusion of these frames), together with images from fixed cells in Fig 6B, for comparison. This is the reason why we have preferred to maintain the improved videos to complement the results of some spare frames from the videos, together with images from fixed cells in the same figure (Fig. 6).

      Regarding video 7, we agree that colocalization is limited by the spatial resolution of confocal  microscopy,  and this fact does not allow us to infer that FMNL1beta is phosphorylated at the IS. However, please realize we have never concluded this in our manuscript.  Instead, we claimed that “colocalization of endogenous FMNL1 and YFP‑FMNL1βWT with anti‑phospho‑Ser  …is compatible with the idea that both endogenous FMNL1 and YFP‑FMNL1βWT are specifically phosphorylated at the cIS”. Moreover, we have now performed colocalization in super‑resolved STED microscopy images, that reduces the XY resolution down to 30-­40 nm (Suppl. Fig. S12), and the results also support colocalization of endogenous FMNL1 with anti-phospho‑Ser PKC at the IS within a 30 nm resolution limit. We have now somewhat softened our conclusion: “Although all these data did not allow us to infer that FMNL1β is phosphorylated at the IS due to the resolution limit of confocal and STED microscopes, the results are compatible with the idea that both endogenous FMNL1 and YFP-FMNL1βWT are specifically phosphorylated at the cIS”.   

      Regarding statistical analyses we agree the dot distribution in the polarity experiments is quite broad, but this is consistent with the end point strategy used by a myriad of research groups (including ourselves) to image an intrinsically stochastic, rapid and asynchronous processes such as immune synapse formation and to score MTOC/MVB  polarization (Calvo et al 2018, https://doi.org/10.3389/fimmu.2018.00684). Despite this fact,  ANOVA  analyses have underscored the statistical significance of all the experiments represented by dot plot experiments. We cannot average or perform meta statistical analyses by combining the equivalent cohort results from independent experiments, since we have observed that small variations of certain variables (SEE concentration, cell recovery, time after transfection, etc.) affect synapse formation and PI values among experiments without altering the final outcome in each case. Please, note that our manuscript includes now 10  multi‑panel figures,  12  multi‑panel supplementary figures and 8 videos, and it is already quite large.  Thus,  we feel the inclusion of redundant, triplicate dot plot figures will dilute and distract to any potential reader from the main message of our already comprehensive contribution. We have now included new sentences at the figure legends to remark ANOVA analyses were executed separately in all the 3 independent experiments.

      Reviewer #2 (Recommendations For The Authors):

      (1) The key findings should be validated on primary CD4+ T cells (of which Jurkat is a transformed model).

      Referee is right. However, as commented by the other referee, the data from activating surfaces clearly shows that the synaptic actin architecture of the immune synapse from primary CD8+ T cells is essentially indistinguishable and thus unbiased from that of Jurkat T cells, but different to that of primary CD4+ cells (Murugesan, 2016). Thus, our data in Jurkat T cells are directly applicable to the synaptic architecture of primary CD8+ cells. In addition, to definitely extend our results in Jurkat cells forming IS, we have performed experiments using synapses established by Raji cells and either primary T cells (TCR-mediated) or primary CAR T cells (CAR-mediated) (new Suppl. Fig. S7) challenged by Raji cells. We have preferred to work with mixed CD4+ and CD8+ cells in order to maintain potential interactions in trans between these subpopulations that may affect or influence IS formation. These experiments clearly show the presence of FMNL1 at these two different IS classes (new Suppl. Fig. S7), similar to what was found in JurkatRaji synapses. Moreover, since most of the experiments were performed in Jurkat cells as stated by the referee, we have changed the title of our manuscript, to circumscribe our results to the model we have used and to be faithful to the main body of our results.

      (2) The image of wt YFP-­FMNL1beta in Figure 4A displays a weak CD63 signal and shows an asymmetric polarization of both the centrosome and MVBs. It should be replaced with a more representative one.

      Referee is right. Accordingly, we have modified the CD63 channel settings (brightness/contrast) in this panel to make it comparable to the other panels in the same figure. In addition, thanks to this referee´s comment, we have realized the position of the MTOC (yellow dot) in the diagram in the right side of the YFP-FMNL1betaWT panels row appeared mislocated, producing the mentioned apparent asymmetry with respect to MVBs’s center of mass (green dot) position. This mistake leads to an apparent segregation between the position of the center of mass of these organelles which certainly does not correspond with the real image. We have now amended the scheme and we apologize for this mistake.

      (3) The images showing F-­actin clearance at the IS (Figure 8, S4, S5) are not very convincing, also when looking at the MFI along the T cell-­‐‑APC interface in the en-­‐face  views.  Since  the  F-­actin  signal  also  includes  some  signal  from  the  APC, transfecting T cells with an actin reporter to selectively image T cell actin could better clarify this key point.

      Referee´s point is correct. However, we (83), and other researchers using the proposed actin reporter approach in the same Raji/Jurkat IS model (Fig. 4 in ref 84) have already excluded the possibility that actin cytoskeleton of Raji cells can also contribute to the measurements of synaptic F-actin. In Materials and Methods, page 37, lines 1048-1055 we included this related sentence:  ¨It is important to remark that MHC-II-antigen triggering on the B cell side of the Th synapse does not induce noticeable F-­actin changes along the synapse (i.e. F-­actin clearing at the central IS), in contrast to TCR stimulation on T cell side (84) (85) (3). In addition, we have observed that majority of F‐‑actin changes along the IS belongs to the Jurkat cell (83). Thus, the contribution to the analyses of the residual, invariant F‐actin from the B cell is negligible using our protocol (83).

      Thus, we can exclude this caveat may affect our results.

      (4) A similar consideration applies to the MVB distribution in the en‑face images. For example, in Figure S5 the MVB profile, with some peripheral distribution, does not appear very different in cells expressing wt YFP‑tagged FMNL1beta versus the S1086A‑expressing cells.

      The referee's assessment regarding Supp. Figure S5 is valid. Using only the plot profile, the outcomes obtained with YFP-FMNL1βWT may appear comparable to those derived from YFP-FMNL1βS1086A. Nonetheless, this resemblance is attributed to the plot profile's exclusive consideration of the MVBs signal in the interface from the immune synapse region (white rectangle). The upper images (second row), where the whole cell is displayed, illustrate that in YFP-FMNL1βWT, MVB are specifically accumulated within this specific region, in contrast to the scattered distribution observed in YFP-FMNL1βS1086A, where MVB are dispersed throughout the cell without distinction. While MVBs are evident in both instances within the synapse region, the reason behind this observation is different. The YFP-FMNL1βWT transfected cell (third column) shows a pronounced MVB concentration within the synaptic area (white rectangle), which leads to MVB PI=0.52, whereas the YFP-FMNL1βS1086A transfected cell (fourth column), as it presents a scattered distribution of MVB throughout the cell, also exhibits some MVB (but only a small proportion of the total cellular MVB) in the synaptic area, which yields MVB PI=-0.09. Please realise that the position of the center of mass of the distribution of MVB (MVBC) labelled in this figure (white squares) is an unbiased parameter that mirrors MVB center of mass polarization. A new sentence has been included in the figure legend to clarify this important point.

      (5) The image in the first row in Figure 6B does not show a clear accumulation of FMNL1beta at the IS, possibly because the T cell is in contact with two APCs. This image should be replaced.

      Referee is right Therefore, we have replaced the quoted example with a single cell:cell synapse that shows a clearer and more localized accumulation in the cIS, thereby avoiding the mentioned caveat.

      (6) In Figure 2A the last row shows what appears to be a T:T cell conjugate (with one cell expressing the YFP-­‐‑tagged protein). The image should be replaced with another showing a T cell-­APC (blue) conjugate.

      Referee is right, we have accordingly replaced the mentioned image with a T cell:APC conjugate.

      (7) The Discussion is very long and dispersive. It would benefit from shortening it and making it more focused.

      Referee is right, we have shortened and focused it, by eliminating the whole second and third paragraphs of the discussion. Moreover, a whole paragraph in page 24 has been also deleted.

      We have also focussed the discussion towards the new data in primary T lymphocytes.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This paper uses a model of binge alcohol consumption in mice to examine how the behaviour and its control by a pathway between the anterior insular cortex (AIC) to the dorsolateral striatum (DLS) may differ between males and females. Photometry is used to measure the activity of AIC terminals in the DLS when animals are drinking and this activity seems to correspond to drink bouts in males but not females. The effects appear to be lateralized with inputs to the left DLS being of particular interest. 

      Strengths: 

      Increasing alcohol intake in females is of concern and the consequences for substance use disorder and brain health are not fully understood, so this is an area that needs further study. The attempt to link fine-grained drinking behaviour with neural activity has the potential to enrich our understanding of the neural basis of behaviour, beyond what can be gleaned from coarser measures of volumes consumed etc. 

      Weaknesses: 

      The introduction to the drinking in the dark (DID) paradigm is rather narrow in scope (starting line 47). This would be improved if the authors framed this in the context of other common intermittent access paradigms and gave due credit to important studies and authors that were responsible for the innovation in this area (particularly studies by Wise, 1973 and returned to popular use by Simms et al 2010 and related papers; e.g., Wise RA (1973). Voluntary ethanol intake in rats following exposure to ethanol on various schedules. Psychopharmacologia 29: 203-210; Simms, J., Bito-Onon, J., Chatterjee, S. et al. Long-Evans Rats Acquire Operant Self-Administration of 20% Ethanol Without Sucrose Fading. Neuropsychopharmacol 35, 1453-1463 (2010).)

      We appreciate the reviewer’s perspective on the history of the alcohol research field. There are hundreds of papers that could be cited regarding all the numerous different permutations of alcohol drinking paradigms. This study is an eLife “Research Advances” manuscript that is a direct follow-up study to a previously published study in eLife (Haggerty et al., 2022) that focused on the Drinking in the Dark model of binge alcohol drinking. This study must be considered in the context of that previous study (they are linked), and thus we feel that a comprehensive review of the literature is not appropriate for this study.

      The original drinking in the dark demonstrations should also be referenced (Rhodes et al., 2005). Line 154 Theile & Navarro 2014 is a review and not the original demonstration. 

      This is a good recommendation. We have added this citation to Line 33 and changed Line 154.

      When sex differences in alcohol intake are described, more care should be taken to be clear about whether this is in terms of volume (e.g. ml) or blood alcohol levels (BAC, or at least g/kg as a proxy measure). This distinction was often lost when lick responses were being considered. If licking is similar (assuming a single lick from a male and female brings in a similar volume?), this might mean males and females consume similar volumes, but females due to their smaller size would become more intoxicated so the implications of these details need far closer consideration. What is described as identical in one measure, is not in another. 

      As shown in Figure 1, all measures of intake are reported as g/kg for both water and alcohol to assess intakes across fluids that are controlled by body weights. We do not reference changes in fluid volume or BACs to compare differences in measured lickometry or photometric signals, except in one instance where we suggest that the total volume of water (ml) is greater than the total amount of alcohol (ml) consumed in DID sessions, but this applies generally to all animals, regardless of sex, across all the experimental procedures.

      In Figure 2 – Figure Supplement 1 we show drinking microstructures across single DID sessions, and that males and females drink similarly, but not identically, when assessing drinking measures at the smallest timescale that we have the power to detect with the hardware we used for these experiments. Admittedly, the variability seen in these measures is certainly non-zero, and while we are tempted to assume that there exist at least some singular drinks that occur identically between males and females in the dataset that support the idea that females are simply just consuming more volume of fluid per singular drink, we don’t have the sampling resolution to support that claim statistically. Further, even if females did consume more volume per singular drink that males, we do not believe that is enough information to make the claim that such behavior leads to more “intoxication” in females compared males, as we know that alcohol behaviors, metabolism, and uptake/clearance all differ significantly by sex and are contributing factors towards defining an intoxication state. We’ve amended the manuscript to remove any language of referencing these drinking behaviors as identical to clear up the language.

      No conclusions regarding the photometry results can be drawn based on the histology provided. Localization and quantification of viral expression are required at a minimum to verify the efficacy of the dual virus approach (the panel in Supplementary Figure 1 is very small and doesn't allow terminals to be seen, and there is no quantification). Whether these might differ by sex is also necessary before we can be confident about any sex differences in neural activity. 

      We provide hit maps of our fiber placements and viral injection centers, as we have, and many other investigators do regularly for publication based on histological verification. Figure 1A clearly shows the viral strategy taken to label AIC to DLS projections with GCaMP7s, and a representative image shows green GCaMP positive terminals below the fiber placement. Considering the experiments, animals without proper viral expression did not display or had very little GCaMP signal, which also serves as an additional expression-based control in addition to typical histology performed to confirm “hits”. These animals with poor expression or obvious misplacement of the fiber probes were removed as described in the methods. Further, we also report our calcium signals as z-scored differences in changes in observed fluorescence, thus we are comparing scaled averages of signals across sexes, and days, which helps minimize any differences between “low” or “high” viral transduction levels at the terminals, directly underneath the tips of the fibers.

      While the authors have some previous data on the AIC to DLS pathway, there are many brain regions and pathways impacted by alcohol and so the focus on this one in particular was not strongly justified. Since photometry is really an observational method, it's important to note that no causal link between activity in the pathway and drinking has been established here. 

      As mentioned above, this article is an eLife Research Advances article that builds on our previous AIC to DLS work published in eLife (Haggerty et al., 2022). Considering that this is a linked article, a justification for why this brain pathway was chosen is superfluous. In addition, an exhaustive review of all the different brain regions and pathways that are affected by binge alcohol consumption to justify this pathway seems more appropriate to a review article than an article such as this.  

      We make no claims that photometric recordings are anything but observational, but we did observe these signals to be different when time-locked to the beginning of drinking behaviors. We describe this link between activity in the pathway and drinking throughout the manuscript. It is indeed correlational, but just because it is not causal does not mean that our findings are invalid or unimportant.

      It would be helpful if the authors could further explain whether their modified lickometers actually measure individual licks. While in some systems contact with the tongue closes a circuit which is recorded, the interruption of a photobeam was used here. It's not clear to me whether the nose close to the spout would be sufficient to interrupt that beam, or whether a tongue protrusion is required. This detail is important for understanding how the photometry data is linked to behaviour. The temporal resolution of the GCaMP signal is likely not good enough to capture individual links but I think more caution or detail in the discussion of the correspondence of these events is required. 

      The lickometers do not capture individual licks, but a robust quantification of the information they capture is described in Godynyuk et al. 2019 and referenced in multiple other papers (Flanigan et al. 2023, Haggerty et al. 2022, Grecco et al. 2022, Holloway et al. 2023) where these lickometers have been used. However, individual lick tracking is not a requirement for tracking drinking behaviors more generally. The lickometers used clearly track when the animals are at the bottles, drinking fluids, and we have used the start of that lickometer signal to time-lock our photometry signals to drinking behaviors. We make no claims or have any data on how photometric signals may be altered on timescales of single licks. In regard to how AIC to DLS signals change on the second time scale when animals initiate drinking behaviors, we believe we explain these signals with caution and in context of the behaviors they aim to describe.

      Even if the pattern of drinking differs between males and females, the use of the word "strategy" implies a cognitive process that was never described or measured. 

      We use the word strategy to describe a plan of action that is executed by some chunking of motor sequences that amounts to a behavioral event, in this case drinking a fluid. We do not mean to imply anything further than this by using this specific word.

      Reviewer #2 (Public Review): 

      Summary: 

      This study looks at sex differences in alcohol drinking behaviour in a well-validated model of binge drinking. They provide a comprehensive analysis of drinking behaviour within and between sessions for males and females, as well as looking at the calcium dynamics in neurons projecting from the anterior insula cortex to the dorsolateral striatum. 

      Strengths: 

      Examining specific sex differences in drinking behaviour is important. This research question is currently a major focus for preclinical researchers looking at substance use. Although we have made a lot of progress over the last few years, there is still a lot that is not understood about sex-differences in alcohol consumption and the clinical implications of this. 

      Identifying the lateralisation of activity is novel, and has fundamental importance for researchers investigating functional anatomy underlying alcohol-driven behaviour (and other reward-driven behaviours). 

      Weaknesses: 

      Very small and unequal sample sizes, especially females (9 males, 5 females). This is probably ok for the calcium imaging, especially with the G-power figures provided, however, I would be cautious with the outcomes of the drinking behaviour, which can be quite variable. 

      For female drinking behaviour, rather than this being labelled "more efficient", could this just be that female mice (being substantially smaller than male mice) just don't need to consume as much liquid to reach the same g/kg. In which case, the interpretation might not be so much that females are more efficient, as that mice are very good at titrating their intake to achieve the desired dose of alcohol. 

      We agree that the “more efficient” drinking language could be bolstered by additional discussion in the text, and thus have added this to the manuscript starting at line 440.

      I may be mistaken, but is ANCOVA, with sex as the covariate, the appropriate way to test for sex differences? My understanding was that with an ANCOVA, the covariate is a continuous variable that you are controlling for, not looking for differences in. In that regard, given that sex is not continuous, can it be used as a covariate? I note that in the results, sex is defined as the "grouping variable" rather than the covariate. The analysis strategy should be clarified. 

      In lines 265-267, we explicitly state that the covariate factor was sex, which is mathematically correct based on the analyses we ran. We made an in-text error where we referred to sex as a grouping variable on Line 352, when it should have been the covariate. Thank you for the catch and we have corrected the manuscript.

      But, to reiterate, we are attempting to determine if the regression fits by sex are significantly different, which would be reported as a significant covariate. Sex is certainly a categorical variable, but the two measures at which we are comparing them against are continuous, so we believe we have the validity to run an ANCOVA here.

      Reviewer #3 (Public Review): 

      Summary: 

      In this manuscript by Haggerty and Atwood, the authors use a repeated binge drinking paradigm to assess how water and ethanol intake changes in male in female mice as well as measure changes in anterior insular cortex to dorsolateral striatum terminal activity using fiber photometry. They find that overall, males and females have similar overall water and ethanol intake, but females appear to be more efficient alcohol drinkers. Using fiber photometry, they show that the anterior insular cortex (AIC) to dorsolateral striatum projections (DLS) projections have sex, fluid, and lateralization differences. The male left circuit was most robust when aligned to ethanol drinking, and water was somewhat less robust. Male right, and female and left and right, had essentially no change in photometry activity. To some degree, the changes in terminal activity appear to be related to fluid exposure over time, as well as within-session differences in trial-by-trial intake. Overall, the authors provide an exhaustive analysis of the behavioral and photometric data, thus providing the scientific community with a rich information set to continue to study this interesting circuit. However, although the analysis is impressive, there are a few inconsistencies regarding specific measures (e.g., AUC, duration of licking) that do not quite fit together across analytic domains. This does not reduce the rigor of the work, but it does somewhat limit the interpretability of the data, at least within the scope of this single manuscript. 

      Strengths: 

      - The authors use high-resolution licking data to characterize ingestive behaviors. 

      - The authors account for a variety of important variables, such as fluid type, brain lateralization, and sex. 

      - The authors provide a nice discussion on how this data fits with other data, both from their laboratory and others'. 

      - The lateralization discovery is particularly novel. 

      Weaknesses: 

      - The volume of data and number of variables provided makes it difficult to find a cohesive link between data sets. This limits interpretability.

      We agree there is a lot of data and variables within the study design, but also believe it is important to display the null and positive findings with each other to describe the changes we measured wholistically across water and alcohol drinking.

      - The authors describe a clear sex difference in the photometry circuit activity. However, I am curious about whether female mice that drink more similarly to males (e.g., less efficiently?) also show increased activity in the left circuit, similar to males. Oppositely, do very efficient males show weaker calcium activity in the circuit? Ultimately, I am curious about how the circuit activity maps to the behaviors described in Figures 1 and 2. 

      In Figure 3C, we show that across the time window of drinking behaviors, that female mice who drink alcohol do have a higher baseline calcium activity compared to water drinking female mice, so we believe there are certainly alcohol induced changes in AIC to DLS within females, but there remains to be a lack of engagement (as measured by changes in amplitude) compared to males. So, when comparing consummatory patterns that are similar by sex, we still see the lack of calcium signaling near the drinking bouts, but small shifts in baseline activity that we aren’t truly powered to resolve (using an AUC or similar measurements for quantification) because the shifts are so small. Ultimately, we presume that the AIC to DLS inputs in females aren’t the primary node for encoding this behavior, and some recent work out of David Werner’s group (Towner et al. 2023) suggests that for males who drink, the AIC becomes a primary node of control, whereas in females, the PFC and ACC, are more engaged. Thus, the mapping of the circuit activity onto the drinking behaviors more generally represented in Figures 1 and 2 may be sexually dimorphic and further studies will be needed to resolve how females engage differential circuitry to encode ongoing binge drinking behaviors.

      - What does the change in water-drinking calcium imaging across time in males mean? Especially considering that alcohol-related signals do not seem to change much over time, I am not sure what it means to have water drinking change. 

      The AIC seems to encode many physiologically relevant, interoceptive signals, and the water drinking in males was also puzzling to us as well. Currently, we think it may be both the animals becoming more efficient at drinking out of the lickometers in early weeks and may also be signaling changes due to thirst states of taste associated with the fluid. While this is speculation, we need to perform more in-depth studies to determine how thirst states or taste may modulate AIC to DLS inputs, but we believe that is beyond the scope of this current study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Line 45 - states alcohol use rates are increasing in females across the past half-decade. I thought this trend was apparent over the past half-century? Please consider revising this. 

      According to NIAAA, the rates of alcohol consumption in females compares to males has been closing for about the past 100 years now, but only recently are those trends starting to reverse, where females are drinking similar amounts or more than males.

      Placing more of the null findings into supplemental data would make the long paper more accessible to the reader. 

      In reference to reviewer’s three’s point as well, there is a lot of data we present, and we hope for others to use this data, both null and positive findings in their future work. As formatted on eLife’s website, we think it is important to place these findings in-line as well.

      Reviewer #2 (Recommendations For The Authors): 

      In addition to the points raised about analysis and interpretation in the Public Review, I have a minor concern about the written content. I find the final sentence of the introduction "together these findings represent targets for future pharmacotherapies.." a bit unjustified and meaningless. The findings are important for a basic understanding of alcohol drinking behaviour, but it's unclear how pharmacotherapies could target lateralised aic inputs into dls. 

      There are on-going studies (CANON-Pilot Study, BRAVE Lab, Stanford) for targeted therapies that use technologies like TMS and focused ultrasound to activate the AIC to alleviate alcohol cravings and decrease heavy drinking days. The difficulty with these next-generation therapeutics is often targeting, and thus we think this work may be of use to those in the clinic to further develop these treatments. We agree that this data does not support the development of pharmacotherapies in a traditional sense, and thus have removed the word and added text to reference TMS and ultrasound approaches to bolster this statement in lines 101+.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We appreciate the feedback provided and refer to our previous response for detailed explanations regarding our decisions on some of the recommendations made by the referees and editors. We have introduced changes as follows:

      • We added a supplementary Figure to Figure 5 to show inhibition by Astemizole at the single channel level.

      • We have corrected Figure 7A, where the normalized current did not reach 1 as a maximum. We had overlooked that this is expected when the prepulse was -160 mV, and the IV is strongly biphasic, but not when coming from -100 mV. We are thankful for this observation, which served to identify that the values for one of the cells were inverted with respect to the others (the sequence of stimuli was different during recording, and this information got lost in the analysis procedure). We have corrected this and made sure that such a mistake had not happened anywhere else.

      • Finally, we have corrected a typo in the discussion, as indicated in the review.

      We include a version with changes marked and a clean version of the manuscript.

  2. Aug 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present a useful analysis of the phenotype of sheep in which the muscle developmental regulator myostatin has been mutated in a FGF5 knockout background. The goal was to produce sheep with a "double-muscled" phenotype, yet the genetically engineered sheep exhibited meat with a smaller cross-sectional area and higher number of muscle fibers. The work extends the extensive body of knowledge already published in this area. The authors provide evidence using in vitro experiments that Fosl1 regulates myogenesis, but the strength of evidence relating to the muscle phenotype and underlying cellular and molecular mechanism is inadequate.

      Thanks for assessment. According to the reviewers' comments, we have supplemented and updated the data on muscle phenotypes, and the molecular mechanisms also have been supplemented accordingly, such as FOSL1 silencing and inhibition, as as well as possible secondary fusion of myoblasts regulated by calcium signaling. Meanwhile, considering the suggestions of editors and reviewers, we have also supplemented the data on serum MSTN regulation. Given that the phenotype of MSTN gene editing is mutation site dependent, we directly cultured skeletal muscle satellite cells using serum from WT and MF+/- sheep, and showed that the serum regulation cannot be ignored after MSTN_Del273C mutation with _FGF5 knockout.

      Public Review:

      Chen and collaborators first analysed in sheep embryonic gene editing using CRISPR-Cas9 technology to invalidate the two alleles of Mstn and Fgf5 genes by using different ratios of Cas9 mRNA and sgRNA. They showed that a ratio of 1:10 had highest efficiency and they successfully generated two sheep with biallelic mutations of both genes. Materials and Methods on the generation of gened edited sheep is entirely missing. The data on these gene edited sheep have been already published twice by the authors in different contexts. Other groups reported on gene editing of Mstn or Fgf5 in sheep embryos and the resulting phenotypes.

      We thank the reviewers for pointing out our negligence and shortcomings. We have provided detailed information on the generation method of gene editing sheep in the Materials and Methods. Briefly, gene-edited sheep were produced by injecting MSTN sgRNA, FGF5 sgRNA, and Cas9 mRNA into embryos in different ratio.

      Although the findings are interesting, they do not provide sufficiently new scientific information or advancements in producing genetically modified livestock with improved production characteristics. While the MSTNDel273 sheep exhibited an increased number of muscle fibers, the data provided did not demonstrate a significant improvement in meat productions, quality or quantity in the MSTNDel273 sheep vs WT.

      Thank you very much for your constructive comments. Considering the lack of data on improving production traits, we have further supplemented the data on meat yield and quality of MSTN_Del273C mutation with _FGF5 knockout sheep in Table S6-10. Although these improvements were not significant enough, our data showed increased meat production traits in MSTN_Del273C mutation with _FGF5 knockout sheep, such as the proportion of hind leg meat to carcass and the proportion of gluteus medius to carcass. For example, the proportion of hind leg meat was significantly increased by 21.2% (Table S7), and the proportion of gluteus medius in the carcass of MF+/- sheep was significantly (P<0.01) increased by 26.3% compared to WT sheep (Figure 2K). In addition, there were no significant (P>0.05) differences in pH, color, drip loss, cooking loss, shearing force, and amino acid content of the longissimus dorsi between WT and MF+/- sheep (Table S8-10). All these results demonstrated that the MSTN_Del273C mutation with _FGF5 knockout sheep had well-developed hip muscles with smaller muscle fibers, which do not affect meat quality, and this phenotype may be dominated by MSTN gene.

      The authors indicate that sgRNA design changes in addition to changing the molar ratio of Cas9MRNA:sgRNA improved the ability to generate biallelic homozygous mutant sheep; however, the data provided to not demonstrate any significant difference. Given the small number of sheep that were actually produced and evaluated,it is extremely difficult to demonstrate anything that was analyzed to be significantly (statistically) different between MSTNDel273 sheep and WT, yet the authors seem to ignore this in much of their discussion. There is no explanation as to why the authors started with sheep that were FGF5 knockouts. The reviewer assumes that this was simply a line of sheep available from previous studies and the goal was to produce sheep with both improved hair/wool characteristics in addition to improved muscle development. However, the use of FGF5 knockout sheep complicates the ability to accurately decipher the unique aspects associated with targeting only myostatin for knock-out. At minimum, this is a variable that has to be considered in the statistical analysis. No information is provided on the methods used to produce the MSTNDel273 sheep, which is fundamentally important. It is assumed they were produced by injecting one-cell zygotes then transferring these into surrogate females. The methods employed might have a profound effect on the outcome.

      We greatly appreciate your review. In the current study, we did not discuss the impact of changes in sgRNA design on the ability to generate biallelic homozygous mutant sheep. In fact, we focused on the delivery molar ratio of Cas9 mRNA to sgRNA and found that increasing the molar ratio of Cas9:sgRNA can improve the ability to produce homozygous biallelic mutations in sheep. We apologize for neglecting this statistical analysis, which was tested for significance of differences in the revised version by the chi-square test. Other restrictions related to the actual production and evaluation of the number of sheep were analyzed in our additional discussion. It should be explained to the reviewers that the gene-edited sheep we produced did not start with FGF5 knockout sheep. As hypothesized by the reviewers, we used a one-step method to simultaneously edit the two genes of MSTN and FGF5 to concomitantly increase muscle yield and improve wool characteristics in sheep, which resulted in knockout of the FGF5 gene and mutation of the MSTN gene. As speculated by the reviewers, the MSTN_Del273C mutation with _FGF5 knockout sheep was generated by injecting sgRNA and Cas9 mRNA of MSTN and FGF5 into a single fertilized egg and then transplanted into a surrogate mother. We have provided detailed information on the generation method of gene edited sheep in the Materials and Methods section.

      Authors genotyped one sheep with a biallelic three base pair deletion in Mstn exon 3 and a compound heterozygote mutation in Fgf5 with a 5 nucleotides deletion on one allele and 37 nucleotides deletion on the other allele, partially spanning over the same region. This sheep developed a double muscle phenotype, which was documented using photography and CT scan. The hair phenotype was not further addressed, but authors referred to a previous publication.

      Thank you for your review. In the current study, we only focused our perspective on the muscle phenotype, while the data on the hair phenotype involved another study. Therefore, we referred to our previous publication on hair phenotypes, in which the mutation locus in FGF5 gene-edited sheep is the same as in the current study.

      Authors performed morphometric studies on two distinct muscles, longissimus dorsi and gluteus medius, and found a profound fiber hypotrophy in the Mstn-/-;Fgf5-/- double mutants, with a shift from larger fiber diameter to smaller fiber sizes. Morphometric studies showed only a low percentage of fibers in wt and mutant sheep had fiber cross sectional areas larger than 800 µm2, whereas about 30% in wt and about 60% in the mutant had CSA of <400 µm2. The report of one case, without reproducing the phenotype in other sheep, is scientifically insufficient. The fiber sizes in wt sheep remains far below previously published reports in sheep (about 3-5 times smaller) and as compared to other species, which suggests a methodological error in morphometric methods.

      We greatly appreciate your careful review. There is indeed an error in morphological analysis of the MF-/- sheep longissimus dorsi and gluteus medius muscles. After carefully checked, we found that the reason for the fiber sizes in WT sheep remains far below previously published reports in sheep was due to the incorrect use of scale. Thus, we re-scanned the tissue sections and re-calculate the cross-sectional area of muscle fibers and the number of muscle fiber cells per unit area with the correct scale. In this case, the average cross-sectional area of muscle fibers in WT sheep was approximately 1800 μm2, which is consistent with the previous report. We once again salute the reviewing expert for such a careful and conscientious review. Considering the profound fiber hypotrophy in MSTN_Del273C mutation with _FGF5 knockout sheep as pointed out by the reviewer, we performed a statistical analysis on the proportion of centrally nucleated myofibres between WT and MF+/- sheep, which can characterize the occurrence of muscle fiber hypotrophy. The results showed that there was no significant difference in the proportion of centrally nucleated myofibres between WT and MF+/- sheep (Figure S2D). At the same time, we also analyzed the mRNA expression levels of muscle fiber hypotrophy and muscle atrophy related genes, such as MTM1, DMD, IGF1, SMN1, and GAA. Although the levels of MTM1, IGF1, SMN1, and GAA were significantly increased (Figure S2E), this elevation did not result in the occurrence of muscle fiber hypotrophy and muscle atrophy, but was beneficial for muscle formation. Therefore, we suggest that the phenomenon produced by MSTN_Del273C mutation with _FGF5 knockout may not be muscle fiber hypotrophy. Because MSTN_Del273C mutation with _FGF5 knockout significantly promotes the proliferation of sheep skeletal muscle satellite cells (Figure 3A-F), and more importantly, its muscle phenotype in MF-/- and MF+/- sheep were improved, including the "double-muscle" phenotype of the rump (Figure 2A), the proportion of gluteus medius in the carcass (Figure 2K), and the proportion of hind leg meat (Table S7).

      The authors also investigated the influence of Fgf5 mutation on muscle development. They determined fiber cross sectional area in heterozygous Fgf5 mutant (number of investigated animals not given) and conclude that Mstn mutation but not Fgf5 mutation caused the double muscle phenotype. Results are insufficient to support this conclusion. Firstly, authors investigated heterozygous FGF5 sheep and not homozygous mutants. Secondly, FGF5 has previously been shown to stimulate expansion of connective tissue fibroblasts and to inhibit skeletal muscle development during limb embryonic development (Clase et al. 2000). Of note, Mstn is also expressed during embryonic development. A combined knockout could therefore entail synergistic effects and cause muscle hyperplasia that is not found in individual knockout, a hypothesis that was not addressed by the authors.

      Thank you very much for your critical review, which is very valuable for improving the quality of our manuscript. We have given the number of animals studied in all figure legends. Given the lack of MSTN and FGF5 single gene edited sheep, both homozygous and heterozygous sheep, especially MSTN single gene edited sheep, we have weakened the view that MSTN mutations rather than FGF5 mutations lead to “double-muscle” phenotype in conclusion and discussion. As you have mentioned, our current data is indeed insufficient to support this conclusion. In addition, considering the expression of MSTN and FGF5 in embryonic development and their regulation of skeletal muscle development, we examined the expression of MSTN and FGF5 in individual development after MSTN_Del273C mutation with _FGF5 knockout (Figure S2A). However, these results are limited by the animals involved in embryonic development, especially single gene edited embryos. We greatly appreciate your very meaningful and valuable comments on the possible synergistic effects of combined knockdown. We will prepare MSTN and FGF5 single gene edited sheep to further explore possible synergistic effects in the following study.

      The authors generated and studied an F1 generation of mutant sheep with heterozyogous mutation in Mstn and Fgf5. In Mstn+/-;Fgf5+/-, gluteus medius muscle was found to be larger compared to wt sheep, whereas other muscles were smaller, and overall meat quantity did not change. Morphometric studies revealed a similar muscle fiber hypotrophy and muscle hyperplasia as in the Mstn-/-;Fgf5-/- gluteus muscle.

      Thank you for your comments. We found that the proportion of gluteus medius in MF+/- sheep was larger than that in WT sheep, and in addition, the proportion of hind leg meat also significantly increased (Table S7). Morphological analysis shows that MF+/- sheep exhibited a myofiber hyperplasia phenotype similar to MF-/- sheep.

      In the next part of results, authors investigated the presence of myostatin protein in homozygous Mstn muscle using immunohistochemistry and found no differences compared to wt, however, positive and negative controls are missing. The also determined Mstn transcription and protein quantity using WB in heterozygous Mstn muscle and found no difference. The authors did not provide data to explain of why the herein generated Mstn mutation causes muscle fiber hypotrophy, whereas most work on myostatin abrogation demonstrated fiber hypertrophy.

      Thank you very much for your constructive comments. Due to the lack of necessary positive and negative controls in immunohistochemistry study, we decided to delete the data on immunohistochemistry in the manuscript to further streamline it. In the current study, although mutations in MSTN lead to a decrease in the cross-sectional area of individual fibers, the number of muscle fibers per unit area were increased, and the final result was an increase in muscle volume and a “double-muscle” phenotype, as well as an increase in the proportion of gluteus medius to carcass (Figure 2K) and the proportion of hind leg meat (Table S7). Importantly, there was no significant difference in the proportion of centrally nucleated myofibres between WT and MF+/- sheep (Figure S2D), and the elevated expression levels of muscle fiber hypotrophy and muscle atrophy marker genes MTM1, IGF1, SMN1, and GAA are more beneficial for muscle health. Therefore, we support that this is not a muscle fiber hypotrophy. As for the phenotype of muscle fiber hypertrophy demonstrated by most myostatin abrogation studies, we analyzed the possible reasons in the discussion, that is, the effect of MSTN mutation on muscle fiber phenotype may be mutant site-dependent.

      Authors then isolated myoblasts from hind limbs of 3-month-old sheep fetuses and cultured in presence of 20% fetal bovine serum before switching to differentiation medium containing 2% horse serum. The cultures showed increased proliferation of Mstn+/-;Fgf5+/- myoblasts as well as downregulation of genes associated with muscle differentiation as well as reduced fusion index. No experiments were performed to assure whether the myostatin and FGF5 pathways were inhibited. No control experiments using supplementation with recombinant proteins and using growth factor depleted culture supplements were performed. As FGF5 and myostatin are secreted factors, evidence is missing whether this led to conditioning of the culture medium. Of note, previous work in mice demonstrated that the double muscle phenotype developed independent of satellite cells activity (Amthor et al. 2009).

      We greatly appreciate your valuable suggestions. In addition to detecting the MSTN pathway at the cellular level, we also assayed the expression of MSTN receptors and downstream Smad and Jun families in the gluteus medius, and found that MSTN_Del273C mutation with _FGF5 knockout led to upregulation of two receptors, while the expression of downstream Smad and Jun families was also inhibited to varying degrees (Figure S4A). Considering the possible serum regulation, we also supplemented the data on serum MSTN regulation. Given that the phenotype of MSTN gene editing is mutation site dependent, we directly cultured skeletal muscle satellite cells using serum from WT and MF+/- sheep. We found that serum from MF+/- sheep promoted the proliferation of skeletal muscle satellite cells (Figure S4D). MSTN_Del273C mutation with _FGF5 knockout promoted FOSL1 expression using WT sheep serum (Figure S4E), which was similar to the results of FBS culture and HS induction. The serum from MF+/- sheep strongly stimulated FOSL1 expression and the inhibition of MyoD1 (Figure S4F). These results indicate that serum regulation cannot be ignored after MSTN_Del273C mutation with _FGF5 knockout.

      Authors then performed RNA seq from Mstn+/-;Fgf5+/- muscle and found a number of differentially expressed genes, but none has been previously reported being involved in the myostatin signaling pathway, so the authors chose to only focus on FOSL1 and associated genes. Authors then demonstrated that Pdpn and Ankrd2 were upregulated during myogenic differentiation, whereas FOPSL1 was downregulated. Moreover, Fosl1 transcription was upregulated in myoblasts and myotubes from Mstn+/-;Fgf5+/- muscle. Authors showed an interaction between Fosl1 and Myod1. Moreover, authors demonstrated that Polsl1 directly binds to the Myod1 promoter. Authors also found decreased p38 MARPK protein levels in proliferating myoblasts from Mstn+/-;Fgf5+/- muscle and increased p38 MARPK in differentiating myotubes.

      In the revised version, we have streamlined this section by removing content such as PDPN, AKNRD2, and p38 MAPK, aiming to focus on the MEK-ERK-FOSL1 axis. Meanwhile, we further confirmed the regulatory effect of FOSL1 on MyoD1 by dual luciferase assay.

      Furthermore, gain-of-function by overexpressing FOSL1 promoted cell proliferation and inhibited differentiation, and tert-butylhydroquinone, an indirect activator of FOSL1 also inhibited myogenic differentiation. The findings do not support the idea that FOSL1 is not involved, but neither do they strongly support the involvement of FOSL1. The observations made by the authors could be co-incidental and not causative in nature.

      We greatly appreciate the valuable suggestions provided by the reviewers, which are of great significance for improving our manuscript. Considering the reviewers’ suggestions, we supplemented the FOSL1 loss-of-function experiments and found that interfering with FOSL1 can inhibit the proliferation and promote differentiation of skeletal muscle satellite cells, which is contrary to the results of overexpression of FOSL1 (Figure 6). Meanwhile, we also used the inhibitor PB98059 to inhibit the ERK pathway to indirectly inhibit the activity of FOSL1, and the results showed that inhibition of FOSL1 activity also promoted myogenic differentiation (Figure 7F-G). These results could further support the important role of FOSL1.

      The manuscript by Chen et al. demonstrated successful gene editing in sheep embryos to obtain biallelic mutation of Mstn and FGF5. The resulting double muscle phenotype resulted from fiber hypotrophy and hyperplasia, which contradicts findings in the literature. Chen et al. generated F1 heterozygous offsprings, in which Mstn transcription and translation did not change. Myoblasts from these animals showed increased proliferation and decreased differentiation, which authors interpreted as the underlying cellular mechanism of the double muscle phenotype. However, no work on muscle development in these animals is presented. Important in vitro control experiments are missing. Chen and collaborators found Fosl1 as a differentially expressed gene in Mstn+/-;Fgf5+/- muscle. Fosl1 drives myoblast proliferation and has direct regulatory effect on the Myod1 promoter. The cellular and molecular mechanism of Fosl1 during myogenesis is novel and solid evidence. However, data remain inadequate to conclude whether Fosl1 indeed acts downstream of myostatin.

      We greatly appreciate the reviewers for their insightful insights and very constructive suggestions, which were very helpful for further improving our data. In our study, although the mutation in MSTN resulted in a decrease in the cross-sectional area of individual muscle fibers, the number of muscle fibers per unit area increased, which ultimately resulted in an increase in muscle size and the development of a "double-muscle" phenotype. Therefore, we support that this is not a manifestation of muscle fiber dystrophy, and the detection of some marker genes for muscle fiber dystrophy and the proportion of central nucleus of muscle fibers also support this hypothesis (Figure S2E-F). In addition, the results such as a reduced cross-sectional area of per muscle fibers in our findings contradict the literature on muscle fiber hypertrophy, which may be due to phenotypic differences caused by mutations at different sites of MSTN, and perhaps may also be species-related. For example, the Belgian blue cattle with a natural mutation in the MSTN gene have an increased number of myofibers and a reduced myofiber cross-sectional area [1], and knockdown of the MSTN gene leads to an increase in the cross-sectional area of muscle fibers in mice, without affecting the number of muscle fibers [2,3], as we further described this in discussion. It should be noted that the possible complementary regulation of FGF5 cannot be ruled out either, but unfortunately, this makes the problem extraordinarily complex. We plan to produce single mutant sheep with segregation of the MSTN and FGF5 genes in subsequent studies and give full consideration to the current problem. Regarding the muscle development of gene-edited animals, due to the limitations of large animal conditions and limited editing individuals, we have not comprehensively evaluated the process of muscle development in vivo to further improve the potential cellular mechanisms of muscle phenotype, except for evaluating the expression of MSTN and FGF5 at the age of 3 months of individual development and the expression of MSTN at 12 months of age (Figure S2A). To determine whether FOSL1 indeed acts downstream of MSTN, we supplemented the expression levels of FOSL1 under serum regulation to support our conclusions (Figure S4D-F).

      [1] Wegner J, Albrecht E, Fiedler I, Teuscher F, Papstein HJ, Ender K. Growth- and breed-related changes of muscle fiber characteristics in cattle[J]. Journal of Animal Science, 2000,78:1485-1496.

      [2] Nishi M, Yasue A, Nishimatu S, Nohno T, Yamaoka T, Itakura M, Moriyama K, Ohuchi H, Noji S. A missense mutant myostatin causes hyperplasia without hypertrophy in the mouse muscle[J]. Biochemical and Biophysical Research Communications, 2002,293:247-251.

      [3] Zhu X, Hadhazy M, Wehling M, Tidball JG, McNally EM. Dominant negative myostatin produces hypertrophy without hyperplasia in muscle[J]. FEBS Letters, 2000,474:71-75.

      As the significant findings are minimal, the amount of text provided, figures and tables are disproportionally excessive. A large number of different molecular techniques are employed to try and decipher the mechanism(s) that result in the observed phenotype = double muscling. The authors focus on the MEK-ERK-FOSL1 pathway an suggest this the key pathway/mechanism resulting in the phenotype observed in MSTNDel273sheep. However, they provide very little solid evidence to support this notion.

      Thank you for your review. We have substantially streamlined the manuscript, removed some irrelevant information, and provided all unnecessary figures and tables as supplementary information. Meanwhile, we have added new data to further support that _MSTN_DelC273 mutation generates a muscle phenotype through the MEK-ERK-FOSL1 pathway.

      The manuscript is very long, complicated and difficult to read, given the minimum amount of significant information that is provided. It requires major rewriting to be published. Further, it misses information in material methods, on the generation of animals, on histological techniques and morphometric studies. There is no information provided on the sex of the animals produced and then analyzed. There are also a number of editorial mistakes e.g. the authors refer to tables S1-S4 in the materials and methods and results section, but and there is no table S1-S4 provided.

      Thank you for your review. We have greatly streamlined and significantly revised the manuscript. At the same time, we have supplemented detailed information on animal generation, histologic and morphological studies in materials and methods, as well as the information on gene-edited animal production, including gender, age, and so on. Finally, we reviewed the entire manuscript and updated any possible omissions or negligence, such as those oversights like tables S1-S4.

      Recommendations for the authors:

      Suggestions to improve the paper (see also public review):

      - Include the method part of generating the gene edited animals.

      We thank the editor and reviewers for pointing out our negligence. We have provided detailed information on the generation method of gene-edited sheep in Materials and Methods, which was produced by injecting MSTN sgRNA, FGF5 sgRNA, and Cas9 mRNA into embryos in different ratios.

      - Increase number of Mstn-/-;Fgf5-/- experimental animals allowing for acquisition of statistically relevant data. This is very important as the muscle phenotype of the F1 generation is not obvious. Authors should provide data that the Mstn mutation indeed invalidates myostatin signaling. They should provide data on myostatin protein Mstn transcription as well on myostatin target genes in Mstn-/-;Fgf5-/- sheep.

      Many thanks to the eidtor and reviewers for their constructive suggestions. The strategy of using MF-/- sheep to validate the transcription and target gene data of myostatin is indeed the best. However, we only generated one MF-/- sheep, which seriously limits the implementation of such an optimal strategy and may also make statistical analysis based on MF-/- sheep unreliable. Considering these factors, our current study mainly focuses on heterozygous MF+/- sheep. We are planning to generate single gene homozygous mutant sheep for MSTN and FGF5 gene separation in subsequent studies and to give full consideration to the current issue.

      - They should also provide data on myostatin target genes in muscles from heterozygous animals.

      Thank you for your very informative suggestions. We have quantitatively detected the mRNA expression levels of the receptors and downstream target genes of MSTN in the gluteus medius of heterozygous MF+/- sheep. Compared with WT sheep, the mRNA expression levels of type I receptor (ACVR1) and type II receptor (ACVR2A, ACVR2B) were highly significantly increased in the muscle of MF+/- sheep (Figure S4A), there was no significant change in mRNA expression levels in the Smand family (Figure S4B), whereas the mRNA expression levels of JunB of Jun family, a downstream target gene of MSTN, were significantly down regulated (Figure S4C). These results suggest that the effect of MSTN_Del273C with _FGF5 knockout may not be limited to MEK-ERK-FOSL1. Again, we would like to thank the editor and reviewers for their constructive suggestions, which provide a new direction for us to further deepen our insight into the mutations of MSTN gene.

      - The morphometric results on fiber CSA seem wrong. By looking at the fiber sizes and size bar in Figure 2 H would bring to far higher estimated CSA. There must be a systematic error in using the morphometric algorithm.

      Thank you very much for your careful review. There were indeed some errors in morphological analysis of the MF-/- sheep longissimus dorsi and gluteus medius. After checking, we found that the reason why the muscle fiber size was much lower than the data in the previously published sheep report was due to the incorrect use of scale bar. To this end, we re-scanned the tissue slices and used the correct scale bar to re-counted the cross-sectional area of muscle fibers and the number of muscle fiber cells per unit area. In this case, the average cross sectional area of muscle fibers in WT sheep was similar to the previous report.

      - The labeling of the ordinate of Fig. 2I is not readable (x1000 µm2, or x100 µm2?). Authors should make sure that they look at the same muscle part, as fiber sizes can highly vary depending on exact anatomical situation. In small laboratory animals, entire muscle cross sections are usually analyzed to prevent such bias. This may proof difficult in large animals, however, small muscles could easily be identified and cross sections of entire muscles be analyzed. As myostatin KO concerns all skeletal muscles, authors could consider muscle such as FDB or extraocular muscles.

      Thank you for your careful review and suggestions. The vertical axis of Figure 2I is in the units of ×1000 μm2, and each data point represents the actual measured area of each muscle fiber. Because there are significant differences in muscle fiber size, we visualized the measurement values of all individual muscle fiber areas, and the average value of the scatter plot was used as the average area of all muscle fibers. We did this to provide a more intuitively display the distribution of muscle fiber size.

      - The material of methods of muscle histology and morphometric studies must be included.

      Thank you for your suggestions. We have supplemented the methods of muscle histology and morphology study, as well as statistical methods for cross-sectional area and quantity of muscle fibers in the material methods.

      - In figures, numbers of experimental animals be given throughout, as well as number of technical repeats. The authors need to provide some minimal data on how the genetically engineered sheep were produced, in addition to how many, the sex etc.....and which of these were analyzed to obtain the data. It is impossible to know when reading this manuscript whether data involving, for example gene seq, westerns, microscopic images etc involves one sheep or some compilation of data.

      Thank you very much for your constructive suggestions, which is of great guiding significance for improving the quality of our manuscript. We have clearly stated the number of experimental animals and the number of any biological replicates in all figure legends. Meanwhile, we have provided detailed information on the generation method of gene edited sheep in the Materials and Methods, which was produced by injecting MSTN sgRNA, FGF5 sgRNA, and Cas9 mRNA into embryos in different ratios.

      - As authors work on Mstn;Fgf5 double KO animals, they should explore whether Fgf5 is expressed in developing sheep muscle, and whether combined KO entails a synergistic effect on muscle development.

      We detected the expression of FGF5 in muscle tissue of WT and MF+/- sheep at 3 months of age of individual development, which was significantly reduced compared to WT sheep (Figure S2A). We greatly appreciate your very meaningful and valuable comments on the possible synergistic effects of combined knockdown. Due to the limitations of single gene knockout of MSTN and FGF5 in sheep in our current study, especially their homozygous mutants. We will prepare MSTN and FGF5 single gene edited sheep to further explore possible synergistic effects in the following study.

      - The authors should address the question of why their mstn mutation causes fiber hypotrophy, whereas most other work reported the opposite. Why would herein generated mutation act differently? Does mutated myostatin gain a different biological effect? Does it bind to different receptors?

      Thank you very much for your valuable comment. Regarding the possibility of muscle fiber dystrophy in MSTN_Del273C mutation with _FGF5 knockout sheep, we have performed a statistical analysis of the proportion of central nucleus of muscle fibers in MF+/- sheep, which can characterize the occurrence of muscle dystrophy in some extent. The results showed that there was no significant difference in the proportion of central nucleus of muscle fibers between WT and MF+/- sheep (Figure S2E). At the same time, we also analyzed the mRNA expression levels of genes MTM1, DMD, IGF1, SMN1, and GAA related to muscle fiber dystrophy and muscle atrophy. Although the levels of MTM1, IGF1, SMN1, and GAA were significantly increased (Figure S2F), this elevation did not lead to the occurrence of muscle fiber dystrophy and muscle atrophy, but instead, it was beneficial for muscle formation. Therefore, we suggested that this phenomenon produced by MSTN_Del273C mutation with _FGF5 knockout may not be muscle fiber dystrophy, as MSTN_Del273C mutation with _FGF5 knockout significantly promoted the proliferation of sheep skeletal muscle satellite cells (Figure 3A-F). More importantly, MSTN_Del273C mutation with _FGF5 knockout improves the muscle phenotype of sheep, including the "double-muscle" phenotype of the rump (Figure 2A), the proportion of gluteus medius to the carcass (Figure 2K), and the proportion of hind leg meat (Table S7). In addition, we analyzed in discussion why the current mutation produces a phenotype different from other work reports, which we suggested that this may be due to different mutation sites. We provided a detailed analysis of this in discussion. It is indeed a very thought-provoking question about whether mutated myostatin acquire different biological effects and whether they bind to different receptors, which we plan to further reveal this in the homozygous MSTN and FGF5 mutant sheep.

      - Concerning the in vitro work, authors need to demonstrate whether Mstn and/or FGF5 signaling pathways are altered in myoblasts/myotubes. As both are secreted factors, authors need to show that serum conditioning is changing in myoblast cultures. Authors should perform cultures in which these factors are entirely suppressed and thus signaling pathway shut down. They could use growth factor depleted supplements and/or add myostatin and FGF5 inhibitors to the serum. The need to determine first the individual effect of myostatin and FGF5 and then challenge the combined effect. They also should perform the inverse experiment and supplement cultures with recombinant factors, both as individual approach and combined approach.

      We greatly appreciate your valuable suggestions. In addition to detecting the MSTN pathway at the cellular level, we also assayed the expression of MSTN receptors and downstream Smad and Jun families in the gluteus medius, and found that MSTN_Del273C mutation with _FGF5 knockout led to upregulation of two receptors, while the expression of downstream Smad and Jun families was also inhibited to varying degrees (Figure S4A). Considering the possible serum regulation, we also supplemented the data on serum MSTN regulation. Because we have previously tested inhibitors of MSTN and FGF5, but did not observe any effect, we suggest this may be due to the nonspecificity of the inhibitors, as there are no sheep specific MSTN and FGF5 inhibitors. Given that the phenotype of MSTN gene editing is mutation site dependent, we directly cultured skeletal muscle satellite cells using serum from WT and MF+/- sheep. We found that serum from MF+/- sheep promoted the proliferation of skeletal muscle satellite cells (Figure S4D). MSTN_Del273C mutation with _FGF5 knockout promoted FOSL1 expression using WT sheep serum (Figure S4E), which was similar to the results of FBS culture and HS induction. The serum from MF+/- sheep strongly stimulated FOSL1 expression and the inhibition of MyoD1 (Figure S4F). These results indicate that serum regulation cannot be ignored after MSTN_Del273C mutation with _FGF5 knockout.

      - With above suggested additional experiments, authors would also be able to demonstrate, whether Fosl1 is indeed triggered in response to myostatin and/or FGF5 signaling.

      To determine whether FOSL1 indeed acts downstream of MSTN, we supplemented the expression levels of FOSL1 under serum regulation to support our conclusions. We found that the serum from MF+/- sheep strongly stimulated FOSL1 expression and the inhibition of MyoD1 (Figure S4F).

      - Authors used t-test despite in several tests despite low sample number, which violates as such the assumption of equal variance. Non-parametric tests should be used in this case.

      Thank you very much for your valuable comments. We apologize for the previous incorrect use of statistical methods. In the revised version, we have re-analyzed all data. Before performing student’s t-test, we first evaluated the assumptions of normal distribution and equal variance. Two-tailed student’s t-tests were used only for data that conformed to normal distribution and homogeneity of variance, otherwise corrected Welch's t-tests were performed.

      - Authors should state in the legends which statistical test was used.

      Thank you for your suggestion. We have clearly stated the statistical testing method used in all figure legends, which is indeed necessary and important.

      In general, this manuscript should be dramatically scaled back in terms of content, eliminating unnecessary text, figures and tables that do not play a significant role in the findings that were significant. There is some interesting information and data here that can add to the overall base of knowledge surrounding the production of genetically engineered livestock in which myostatin has been targeted for mutation. However, the authors need to focus on their findings that were significant and strongly supported by the data and statistical analysis. Some discussion of findings that support their ideas/hypothesis, but are not statistically significant is fine. But it should not make up the majority of the manuscript which is the case here.

      Thank you for your valuable suggestions, which are essential for improving the quality of our manuscript. We have greatly streamlined and significantly revised the manuscript, removed unnecessary text, figures, and tables.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      Arpin is a negative regulator of Arp2/3 activity. Here the authors investigated the role of arpin in vascular permeability using appropriate cultured human and murine endothelial monolayers and successfully developed an arpin KO mice. The results clearly show arpin is expressed in blood vessels (not clear about lymphatics but given leaky vessels, one wonders). The data show that arpin is important for vessel barrier function yet its genetic loss still leads to viable animals in the C57Blk strain albeit with leaky blood vessels. The data are well presented and controls are included. However, the evidence that arpin loss/knockdown causes increased actin functions independent of Arp2/3 is based on pharmacological data and is indirect. Authors conclude ROCK1 activity is elevated and the cause of lost barrier function by arpin reduction. I do have one suggestion for the authors that involves a new study in these animals, which could strengthen their proposed mechanism that the vascular defects are independent of Arp2/3 activity and rather involve ROCK1 but not ZIPK.

      (1) If arpin is working via ROCK1, as the authors infer, perhaps treatment of arpin-/- mice with ROCK1 inhibitor(s) would attenuate vessel permeability while HS38 treatment would not. This type of study would strengthen the conclusion that ROCK1, but not ZIPK, was involved. Including CK666 if active in mouse cells, could also be tested.

      To analyze vascular permeability in vivo, we performed Miles assays in arpin+/+ and arpin-/- mice using the inhibitors of ROCK1 (Y27632) and ZIPK (HS38). Both Y27632 and HS38 reduced the permeability caused by absence of arpin (new Figure 8E), thus confirming what we observed before in HUVEC (shown in old Figure 7). CK666 did not change the permeability in arpin-/- mice, thus confirming the conclusion that arpin does not regulate vascular permeability via Arp2/3 but rather via ROCK1/ZIPK-mediated stress fiber formation (page 13).

      (2) Fig 5. Data demonstrate that Arpin regulates actin filament formations and permeability in HUVEC, but this does not demonstrate its occurring in an Arp2/3-independent manner. If I understand your data this is indirect evidence. One needs more information to reach this conclusion. Can authors measure Arp2/3 directly and then test whether arpin knockdown and CK666 have the same capacity to reduce Arp2/3 activity in vitro.

      Arp2/3 activity cannot be measured directly. The commonly used approach is therefore Arp2/3 inhibition via CK666. Our new in vivo permeability assays (see answer above) together with our HUVEC data in Figure 5 clearly show that CK666 does not have the same effect as arpin knock-down, and neither does CK666 rescue the effects of arpin deficiency in vitro and in vivo. Together, these findings clearly suggest that arpin does not regulate endothelial permeability via Arp2/3.

      Minor issues:

      Fig 2, 3 or other Figs: In presented western blots, all proteins should include appropriate mw labels.

      Thank you. Molecular weights have been added to all Western blots.

      Fig 2. Suggest that like your arpin analysis, amounts of AP1AP and PICK1 at baseline and TNF-treatment by blotting should be included. A minor point is yellow color for labels does not stand out and should be changed to another color - as the authors used in Fig 2C.

      We have included Western blots and quantifications for PICK1 in Figure S1A and S1C. An antibody against AP1AP was unfortunately not available.

      The yellow color has been changed to purple for better visibility.

      Fig 2C. The arpin loss at junctions and actin filaments (Figure 2C) is very minor even though it reached statistical significance. It really is not an obvious loss from your 3 color overlay.

      Thank you. It is indeed hard to see. We included now magnifications in Figure 2C that better show the loss of arpin at junctions.

      Fig 8, text 303-310 shows in vivo evidence of lung congestion and edema. Also appear to be inflammatory cells present in images. If these are inflammatory cells, it begs the question if these mice have an abnormal complete blood cell count (CBC). Suggest adding CBC data for arpin-/- vs control arpin +/+ mice in Fig 8.

      The pathologist observed the presence of lymphocytes and macrophages, indicating the possibility of a (low level) chronic inflammation in arpin-deficient lungs. However, we now also performed hemograms of the mice (new Table S2) that showed no significant difference in the blood cell count of arpin-/- and arpin+/+ mice. Thus, the presence of lymphocytes and macrophages cannot be explained simply by higher leukocyte counts (page 14).

      Line 289, pg 13, Fig 8: Lung levels of arpin are not shown in Fig 8B. Authors must mean another fig?

      Sorry. Arpin protein levels in lungs are shown in figure 8C. This has been corrected on page 13.

      Reviewer #2 (Recommendations For The Authors):

      This is a solid piece of work that adds a small amount of additional factual information to our understanding of cell-cell junctions. The experimental work is of good quality and is sufficient to support the aims of the paper. I think the value of the work is to add this small amount of new knowledge to the archive. I do not believe that further experimental work would add to the paper - it's done. But this doesn't have the impact or completeness for this journal. It belongs in a for-the-record journal.

      We appreciate your overall positive evaluation and your comments that our study represents a solid piece of work with good quality experimental work. However, we are not sure what you mean by “it belongs in a for-the-record journal”. Anyway, we agree that our study does not reveal a complete mechanism of how arpin regulates actin stress fibers, but we respectfully disagree that our study only adds a “small amount of additional factual information”. We may not have been very clear about it, but we present in this study several new discoveries and although some are descriptive in nature that does not make them trivial or less important. We provide for the first time experimental evidence that: 1) arpin is expressed in endothelial cells in vitro and in vivo, and downregulated during inflammation; 2) presence of arpin is required for proper endothelial permeability regulation and junction architecture; 3) arpin exerts these functions in an Arp2/3-independent manner; 4) arpin controls actomyosin contractility in a ROCK1- and ZIPK-dependent fashion; 5) arpin knock-out mice are viable and breed and develop normally but show histological characteristics of a vascular phenotype and increased vascular permeability that can be rescued by inhibition of ROCK1 and ZIPK. The fact that arpin fulfills its functions in endothelial cells independently of the Arp2/3 complex is of special relevance as previously the only known function of arpin was the inhibition of the Arp2/3 complex. Thus, we believe that our study adds a significant amount of new information to the literature. Thank you very much.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary: 

      BMP signaling is, arguably, best known for its role in the dorsoventral patterning, but not in nematodes, where it regulates body size. In their paper, Vora et al. analyze ChIP-Seq and RNA-Seq data to identify direct transcriptional targets of SMA-3 (Smad) and SMA-9 (Schnurri) and understand the respective roles of SMA-3 and SMA-9 in the nematode model Caenorhabditis elegans. The authors use publicly available SMA-3 and SMA-9 ChIP-Seq data, own RNA-Seq data from SMA-3 and SMA-9 mutants, and bioinformatic analyses to identify the genes directly controlled by these two transcription factors (TFs) and find approximately 350 such targets for each. They show that all SMA-3-controlled targets are positively controlled by SMA-3 binding, while SMA-9-controlled targets can be either up or downregulated by SMA-9. 129 direct targets were shared by SMA-3 and SMA-9, and, curiously, the expression of 15 of them was activated by SMA-3 but repressed by SMA-9. Since genes responsible for cuticle collagen production were eminent among the SMA-3 targets, the authors focused on trying to understand the body size defect known to be elicited by the modulation of BMP signaling. Vora et al. provide compelling evidence that this defect is likely to be due to problems with the BMP signaling-dependent collagen secretion necessary for cuticle formation. 

      We thank the reviewer for this supportive summary. We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, we have provided the first comprehensive analysis of these datasets.

      Strengths: 

      Vora et al. provide a valuable analysis of ChIP-Seq and RNA-Seq datasets, which will be very useful for the community. They also shed light on the mechanism of the BMP-dependent body size control by identifying SMA-3 target genes regulating cuticle collagen synthesis and by showing that downregulation of these genes affects body size in C. elegans. 

      Weaknesses: 

      (1) Although the analysis of the SMA-3 and SMA-9 ChIP-Seq and RNA-Seq data is extremely useful, the goal "to untangle the roles of Smad and Schnurri transcription factors in the developing C. elegans larva", has not been reached. While the role of SMA-3 as a transcriptional activator appears to be quite straightforward, the function of SMA-9 in the BMP signaling remains obscure. The authors write that in SMA-9 mutants, body size is affected, but they do not show any data on the mechanism of this effect. 

      We thank the reviewer for directing our attention to the lack of clarity about SMA-9’s function. We will revise the text to highlight what this study and others demonstrate about SMA-9’s role in body size. We also plan to analyze additional target genes to deepen our model for how SMA-3 and SMA-9 interact functionally to produce a given transcriptional response.

      (2) The authors clearly show that both TFs can bind independently of each other, however, by using distances between SMA-3 and SMA-9 ChIP peaks, they claim that when the peaks are close these two TFs act as complexes. In the absence of proof that SMA-3 and SMA-9 physically interact (e.g. that they co-immunoprecipitate - as they do in Drosophila), this is an unfounded claim, which should either be experimentally substantiated or toned down. 

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. The limitation in the previous work is that only a small number of target genes was analyzed. Our goal in this study was to determine how widespread this interaction is on a genomic scale.  Our analyses demonstrate for the first time that a Schnurri transcription factor has significant numbers of both Smad-dependent and Smad-independent target genes. We will revise the text to clarify this point.

      (3) The second part of the paper (the collagen story) is very loosely connected to the first part. dpy-11 encodes an enzyme important for cuticle development, and it is a differentially expressed direct target of SMA-3. dpy-11 can be bound by SMA-9, but it is not affected by this binding according to RNA-Seq. Thus, technically, this part of the paper does not require any information about SMA-9. However, this can likely be improved by addressing the function of the 15 genes, with the opposing mode of regulation by SMA-3 and SMA-9. 

      We appreciate this suggestion and will clarify how SMA-9 and its target genes contribute to collagen organization and body size regulation.

      (4) The Discussion does not add much to the paper - it simply repeats the results in a more streamlined fashion. 

      We thank the reviewer for this suggestion. We will add more context to the Discussion.

      Reviewer #2 (Public Review): 

      In the present study, Vora et al. elucidated the transcription factors downstream of the BMP pathway components Smad and Schnurri in C. elegans and their effects on body size. Using a combination of a broad range of techniques, they compiled a comprehensive list of genome-wide downstream targets of the Smads SMA-3 and SMA-9. They found that both proteins have an overlapping spectrum of transcriptional target sites they control, but also unique ones. Thereby, they also identified genes involved in one-carbon metabolism or the endoplasmic reticulum (ER) secretory pathway. In an elaborate effort, the authors set out to characterize the effects of numerous of these targets on the regulation of body size in vivo as the BMP pathway is involved in this process. Using the reporter ROL-6::wrmScarlet, they further revealed that not only collagen production, as previously shown, but also collagen secretion into the cuticle is controlled by SMA-3 and SMA-9. The data presented by Vora et al. provide in-depth insight into the means by which the BMP pathway regulates body size, thus offering a whole new set of downstream mechanisms that are potentially interesting to a broad field of researchers.

      The paper is mostly well-researched, and the conclusions are comprehensive and supported by the data presented. However, certain aspects need clarification and potentially extended data. 

      (1) The BMP pathway is active during development and growth. Thus, it is logical that the data shown in the study by Vora et al. is based on L2 worms. However, it raises the question of if and how the pattern of transcriptional targets of SMA-3 and SMA-9 changes with age or in the male tail, where the BMP pathway also has been shown to play a role. Is there any data to shed light on this matter or are there any speculations or hypotheses? 

      We agree that these are intriguing questions and we are interested in the roles of transcriptional targets at other developmental stages and in other physiological functions, but these analyses are beyond the scope of the current study.

      (2) As it was shown that SMA-3 and SMA-9 potentially act in a complex to regulate the transcription of several genes, it would be interesting to know whether the two interact with each other or if the cooperation is more indirect. 

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. Our goal in this study was not to validate this physical interaction, but to analyze functional interactions on a genome-wide scale.

      (3) It would help the understanding of the data even more if the authors could specifically state if there were collagens among the genes regulated by SMA-3 and SMA-9 and which. 

      We thank the reviewer for this suggestion and will add the requested information in the text.

      (4) The data on the role of SMA-3 and SMA-9 in the regulation of the secretion of collagens from the hypodermis is highly intriguing. The authors use ROL-6 as a reporter for the secretion of collagens. Is ROL-6 a target of SMA-9 or SMA-3? Even if this is not the case, the data would gain even more strength if a comparable quantification of the cuticular levels of ROL-6 were shown in Figure 6, and potentially a ratio of cuticular versus hypodermal levels. By that, the levels of secretion versus production can be better appreciated. 

      rol-6 has been identified as a transcriptional target of this pathway. The level of ROL-6 protein, however, is not changed in sma-3 and sma-9 mutants, indicating that there is post-transcriptional compensation. We will include these data in the revised manuscript.

      (5) It is known that the BMP pathway controls several processes besides body size. The discussion would benefit from a broader overview of how the identified genes could contribute to body size. The focus of the study is on collagen production and secretion, but it would be interesting to have some insights into whether and how other identified proteins could play a role or whether they are likely to not be involved here (such as the ones normally associated with lipid metabolism, etc.). 

      We will add this information to the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary Responses: Besides the WT allele, equivalent to the mouse TMEM173 gene, the human TMEM173 gene has two common alleles: the HAQ and AQ alleles carried by billions of people. The main conclusions and interpretation, summarized in the Title and Abstract, are i) Different from the WT TMEM173 allele, the HAQ or AQ alleles are resistant to STING activation-induced cell death; ii) STING residue 293 is critical for cell death; iii) HAQ, AQ alleles are dominant to the SAVI allele; iv) One copy of the AQ allele rescues the SAVI disease in mice. We propose that STING research and STING-targeting immunotherapy should consider human TMEM173 heterogeneity. These interpretations and conclusions were based on Data and Logic. We welcome alternative, logical interpretations and collaborations to advance the human TMEM173 research.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Aybar-Torres et al investigated the effect of common human STING1 variants on STING-mediated T cell phenotypes in mice. The authors previously made knock-in mice expressing human STING1 alleles HAQ or AQ, and here they established a new knock-in line Q293. The authors stimulated cells isolated from these mice with STING agonists and found that all three human mutant alleles resist cell death, leading to the conclusion that R293 residue is essential for STING-mediated cell death (there are several caveats with this conclusion, more below). The authors also bred HAQ and AQ alleles to the mouse Sting1-N153S SAVI mouse and observed varying levels of rescue of disease phenotypes with the AQ allele showing more complete rescue than the HAQ allele. The Q293 allele was not tested in the SAVI model. They conclude that the human common variants such as HAQ and AQ have a dominant negative effect over the gain-of-function SAVI mutants.

      Strengths:

      The authors and Dr. Jin's group previously made important observations of common human STING1 variants, and these knock-in mouse models are essential for understanding the physiological function of these alleles.

      Weaknesses:

      However, although some of the observations reported here are interesting, the data collectively does not support a unified model. The authors seem to be drawing two sets of conclusions from in vitro and in vivo experiments, and neither mechanism is clear. Several experiments need better controls, and these knock-in mice need more comprehensive functional characterization.

      (1) In Figure 1, the authors are trying to show that STING agonist-induced splenocytes cell death is blocked by HAQ, AQ and Q alleles. The conclusion at line 134 should be splenocytes, not lymphocytes. Most experiments in this figure were done with mixed population that may involve cell-to-cell communication. Although TBK1-dependence is likely, a single inhibitor treatment of a mixed population is not sufficient to reach this conclusion.

      We greatly appreciate Reviewer 1's insights. We changed the “lymphocytes” to “splenocytes” (line 133) as suggested. We respectfully disagree with Reviewer 1’s comments on TBK1. First, we used two different TBK1 inhibitors: BX795 and GSK8612. Second, because BX795 also inhibits PDK1, we used a PDK1 inhibitor GSK2334470; Third, both BX795 and GSK8612 completely inhibited diABZI-induced splenocyte cell death (Figure 1B) (lines 128 – 133). The logical conclusion is “TBK1 activation is required for STING-mediated mouse spleen cell death ex vivo”. (line 117).

      Our discovery that the common human TMEM173 alleles are resistant to STING activation-induced cell death is a substantial finding. It further strengthens the argument that the HAQ and AQ alleles are functionally distinct from the WT allele 1-3. We wish to underscore the crucial message of this study-that 'STING research and STING-targeting immunotherapy should consider TMEM173 heterogeneity in humans' (line 37), which has been largely overlooked in current STING clinical trials 4.

      Regarding STING-Cell death, as we stated in the Introduction (lines 65-77). i) STING-mediated cell death is cell type-dependent 5-7 and type I IFNs-independent 5,7,8. ii) The in vivo biological significance of STING-mediated cell death is not clear 7,8. iii) The mechanisms of STING-Cell death remain controversial. Multiple cell death pathways, i.e., apoptosis, necroptosis, pyroptosis, ferroptosis, and PANoptosis, are proposed 7,9,10. SAVI/HAQ, SAVI/AQ prevented lymphopenia and alleviated SAVI disease in mice. Thus, the manuscript provides some answers to the biological significance of STING-cell death in vivo, which is new. Regarding the molecular mechanism, splenocytes from Q293/Q293 mice are resistant to STING cell death. The logical conclusion is that the amino acid 293 is critical for STING cell death (line 29).

      Extensive studies are needed, beyond the scope of this manuscript, on how aa293 and TBK1 mediates STING-Cell death to resolve the controversies in the STING-cell death fields (e.g. apoptosis, necroptosis, pyroptosis, ferroptosis, and PANoptosis).

      (2) Q293 knock-in mouse needs to be characterized and compared to HAQ and AQ. Is this mutant expressed in tissues? Does this mutant still produce IFN and other STING activities? Does the protein expression level altered on Western blot? Is the mutant protein trafficking affected? In the authors' previous publications and some of the Western blot here, expression levels of each of these human STING1 protein in mice are drastically different. HAQ and AQ also have different effects on metabolism (pmid: 36261171), which could complicate interoperation of the T cell phenotypes.

      These are very important questions that require rigorous investigations that are beyond the scope of this manuscript. This manuscript, titled “The common TMEM173 HAQ, AQ alleles rescue CD4 T cellpenia, restore T-regs, and prevent SAVI (N153S) inflammatory disease in mice” does not focus on Q293 mice. We have been investigating the common human TMEM173 alleles since 2011 from the discovery 11 , mouse model 1,3, human clinical trial 2, and human genetics studies 3. This manuscript is another step towards understanding these common human TMEM173 alleles with the new discovery that HAQ, AQ alleles are resistant to STING cell death.

      (3) HAQ/WT and AQ/WT splenocytes are protected from STING agonist-induced cell death equally well (Figure 1G). HAQ/SAVI shows less rescue compared to AQ/SAVI. These are interesting observations, but mechanism is unclear and not clearly discussed. E.g., how does AQ protect disease pathology better than HAQ (that contains AQ)? Does Q293 allele also fully rescue SAVI?

      In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 251 – 261). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 than HAQ T-regs 3. Thus, increased IL-10+ Tregs in AQ mice may contribute to an improved phenotype in AQ/SAVI compared to HAQ/SAVI. However, we are not excluding other contributions (e.g. metabolic difference) (lines 332-335). We are exploring these possibilities.  

      (4) Figure 2 feels out of place. First of all, why are the authors using human explant lung tissues? PBMCs should be a better source for lymphocytes. In untreated conditions, both CD4 and B cells show ~30% dying cells, but CD8 cells show 0% dying cells. This calls for technical concerns on the CD8 T cell property or gating strategy because in the mouse experiment (Figure 1A) all primary lymphocytes show ~30% cell death at steady-state. Second, Figure 2C, these type of partial effect needs multiple human donors to confirm. Three, the reconstitution of THP1 cells seems out of place. STING-mediated cell death mechanism in myeloid and lymphoid cells are likely different. If the authors want to demonstrate cell death in myeloid cells using THP1, then these reconstituted cell lines need to be better validated. Expression, IFN signaling, etc. The parental THP1 cells is HAQ/HAQ, how does that compare to the reconstitutions? There are published studies showing THP1-STING-KO cells reconstituted with human variants do not respond to STING agonists as expected. The authors need to be scientifically rigorous on validation and caution on their interpretations.

      Figure 2 is necessary because it reveals the difference between mouse and human STING cell death, which is critical to understand STING in human health and diseases (lines 160-161). Figure 2A-2B showed that STING activation killed human CD4 T cells, but not human CD8 T cells or B cells. This observation is different from Figure 1A, where STING activation killed mouse CD4, CD8 T cells, and CD19 B cells, revealing the species-specific STING cell death responses. Regarding human CD8 T cells, as we stated in the Discussion (lines 323-325), human CD8 T cells (PBMC) are not as susceptible as the CD4 T cells to STING-induced cell death 8. We used lung lymphocytes that showed similar observations (Figure 2A). For Figure 2C, we used 2 WT/HAQ and 3 WT/WT individuals (lines 738-739). We generate HAQ, AQ THP-1 cells in STING-KO THP-1 cells (Invivogen,, cat no. thpd-kostg) (lines 380-387).

      A recent study found that a new STING agonist SHR1032 induces cell death in STING-KO THP-1 cells expressing WT(R232) human STING 10 (line 182). SHR1032 suppressed THP1-STING-WT(R232) cell growth at GI50: 23 nM while in the parental THP1-STING-HAQ cells, the GI50 of SHR1032 was >103 nM 10. Cytarabine was used as an internal control where SHR1032 killed more robustly than cytarabine in the THP1-STING-WT(R232) cells but much less efficiently than cytarabine in the THP-1-STING-HAQ cells 10. 

      Our manuscript rigorously uses mouse splenocytes, human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo.

      We agree with Reviewer 1 that STING-mediated cell death mechanisms in myeloid and lymphoid cells may be different and likely contribute to the different mechanisms proposed in STING cell death research 7,9,10. Our study focuses on the in vivo STING-mediated T cellpenia.

      (5) Figure 2G, H, I are confusing. AQ is more active in producing IFN signaling than HAQ and Q is the least active. How to explain this?

      We stated in the Introduction that “AQ responds to CDNs and produce type I IFNs in vivo and in vitro 3,12,13 ”(line 92-93). We reported that the AQ knock in mice responded to STING activation 3. We previously showed that there was a negative natural selection on the AQ allele in individuals outside of Africa 3. 28% of Africans are WT/AQ but only 0.6% East Asians are WT/AQ 3. In contrast, the HAQ allele was positively selected in non-Africans 3. Investigation to understand the mechanisms and biological significance of these naturally selected human TMEM173 alleles has been ongoing in the lab.

      (6) The overall model is unclear. If HAQ, AQ and Q are loss-of-function alleles and Q is the key residue for STING-mediated cell death, then why AQ is the most active in producing IFN signaling and AQ/SAVI rescues disease most completely? If these human variants act as dominant negatives, which would be consistent with the WT/het data, then how do you explain AQ is more dominant negative than HAQ?

      In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 251 – 261). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 and mitochondria activity than HAQ T-regs 3. Nevertheless, we are not excluding other contributions (e.g. metabolic difference) by the AQ allele (lines 332-335). Last, we used modern human evolution to discover the dominance of these common human STING alleles. In modern humans outside Africans, HAQ was positively selected while AQ was negatively selected 3. However, AQ is likely dominant to HAQ because there is no HAQ/AQ individuals outside Africa. The genetic dominance of common human TMEM173 allele is a new concept. More investigation is ongoing.

      (7) As a general note, SAVI disease phenotypes involve multiple cell types. Lymphocyte cell death is only one of them. The authors' characterization of SAVI pathology is limited and did not analyze immunopathology of the lung.

      Both radioresistant parenchymal and/or stromal cells and hematopoietic cells influence SAVI pathology in mice 14,15. Nevertheless, the lack of CD 4 T cells, including the anti-inflammatory T-regs, likely contributes to the inflammation in SAVI mice and patients 16. We characterized lung function, lung inflammation (Figure 4), lung neutrophils, and inflammatory monocyte infiltration (Figure S5) (lines 232-235).

      (8) Line 281, the discussion on HIV T cell death mechanism is not relevant and over-stretching. This study did not evaluate viral infection in T cells at all. The original finding of HAQ/HAQ enrichment in HIV/AIDS was 2/11 in LTNP vs 0/11 in control, arguably not the strongest statistics.

      Several publications have linked STING to HIV pathogenesis 17-22  (line 271). CD4 T cellpenia is a hallmark of AIDS. The manuscript studies STING activation-induced T cellpenia in vivo. It is not stretching to ask, for example, does preventing STING T cell death (e.g HAQ, AQ alleles) can restore CD4 T cell counts and improve care for AIDS patients?

      Reviewer #2 (Public Review):

      Aybar-Torres and colleagues utilize common human STING alleles to dissect the mechanism of SAVI inflammatory disease. The authors demonstrate that these common alleles alleviate SAVI pathology in mice, and perhaps more importantly use the differing functionality of these alleles to provide insight into requirements of SAVI disease induction. Their findings suggest that it is residue A230 and/or Q293 that are required for SAVI induction, while the ability to induce an interferon-dependent inflammatory response is not. This is nicely exemplified by the AQ/SAVI mice that have an intact inflammatory response to STING activation, yet minimal disease progression. As both mutants seem to be resistant STING-dependent cell death, this manuscript also alludes to the importance of STING-dependent cell death, rather than STING-dependent inflammation, in the progression of SAVI pathology. While I have some concerns, I believe this manuscript makes some important connections between STING pathology mouse models and human genetics that would contribute to the field.

      Some points to consider:

      (1) While the CD4+ T cell counts from HAQ/SAVI and AQ/SAVI mice suggest that these T cells are protected from STING-dependent cell death, an assay that explores this more directly would strengthen the manuscript. This is also supported by Fig 2C, but I believe a strength of this manuscript is the comparison between the two alleles. Therefore, if possible, I would recommend the isolation of T cells from these mice and direct stimulation with diABZI or other STING agonist with a cell death readout.

      Please see the new Figure S3 for cell death by diABZI, DMXAA in Splenocytes from WT/WT, WT/HAQ, HAQ/SAVI, AQ/SAVI mice. The HAQ/SAVI and AQ/SAVI splenocytes showed similar partial resistance to STING activation-induced cell death (lines 214-216).

      (2) Related to the above point - further exemplifying that the Q293 locus is essential to disease, even in human cells, would also strengthen the paper. It seems that CD4 T cell loss is a major component of human SAVI. While not co_mpletely necessary, repeating the THP1 cell death experiments from Fig 2 with a human T cell line would round out the study nicely._

      We examined HAQ, AQ mouse splenocytes, HAQ human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo. Additional human T cell line work does not add too much. We hope to conduct more human PBMC or lung lymphocytes STING cell death experiments from HAQ, AQ individuals as we continue the human STING alleles investigation.

      (3) While I found the myeloid cell counts and BMDM data interesting, I think some more context is needed to fully loop this data into the story. Is myeloid cell expansion exemplified by SAVI patients? Do we know if myeloid cells are the major contributors to the inflammation these patients experience? Why should the SAVI community care about the Q293 locus in myeloid cells?

      This is likely a misunderstanding. We use BMDM for the purpose of comparing STING signaling (TBK1, IRF3, NFkB, STING activation) by WT/SAVI, HAQ/SAVI, AQ/SAVI. Ideally, we would like to compare STING signaling in CD4 T cells from WT/SAVI to HAQ/SAVI, AQ/SAVI mice. However, WT/SAVI has no CD4 T cells. Doing so, we are making the assumption that the basic STING signaling (TBK1, IRF3, NFkB, STING activation) is conserved between T cells and macrophages.

      (4) The functional assays in Figure 4 are exciting and really connect the alleles to disease progression. To strengthen the manuscript and connect all the data, I would recommend additional readouts from these mice that address the inflammatory phenotype shown in vitro in Figure 5. For example, measuring cytokines from these mice via ELISA or perhaps even Western blots looking for NFkB or STING activation would be supportive of the story. This would also allow for some tissue specificity. I believe looking for evidence of inflammation and STING activation in the lungs of these mice, for example, would further connect the data to human SAVI pathology.

      Reviewer 2 suggests looking for evidence of inflammation and STING activation in the lungs of HAQ/SAVI, AQ/SAVI. We would like to elaborate further. First, anti-inflammatory treatments, e.g. steroids, DMARDs, IVIG, Etanercept (TNF), rituximab, Nifedipine, amlodipine, et al., all failed in SAVI patients 23. JAK inhibitors on SAVI had mixed outcomes (lines 55-58). Second, Figure S5 examined lung neutrophils and inflammatory monocyte infiltration. Interestingly, while AQ/SAVI mice had a better lung function than HAQ/SAVI mice (Figure 4D, 4E vs 4H, 4I), HAQ/SAVI and AQ/SAVI lungs had comparable neutrophils and inflammatory monocyte infiltration (Figure S5). Last, SAVI is classified as type I interferonopathy 23, but the lung diseases of SAVI are mainly independent of type I IFNs 24-27. The AQ allele suppresses SAVI in vivo.  Understanding the mechanisms by which AQ rescues SAVI may lead to curative care for SAVI patients.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      One suggestion is to streamline this study by focusing on STING-mediated cell death only in CD4 T cells. The authors can use in vitro PBMC isolated human T cells, ex vivo T cells from the knock-in mice, and in vivo T cells from the SAVI breeding. The current manuscript includes myeloid cell death, Tregs, complex SAVI disease pathology, which is too confusing and too complex to explain with the varying effect from the three human STING1 variants.

      We sincerely appreciate Reviewer 1’s suggestion. The goal of our human STING alleles research has always been translational, i.e. improving human health. Even as a monogenetic disease, the SAVI pathology is still complex. For example, thought as a type I Interferonopathy, SAVI is largely independent of type I IFNs. Similarly, STING-activation-induced cell death, while contribute to SAVI, is not the whole story, as the Reviewer pointed out in the Comment 3 & 6 &7. HAQ/SAVI mice still died early and had lung dysfunction (Figure 4). In contrast, AQ/SAVI mice restore lifespan and lung function. We had Figure 6 show different T-regs between AQ/SAVI and HAQ/SAVI mice. In addition, AQ mice had more IL-10+ T-regs than HAQ mice 3. Therefore, we are excited about developing AQ-based curative therapy for SAVI patients (preventing cell death and inducing immune tolerance).  Again, we thank the Reviewer for the suggestion. Additional research is ongoing.

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      (1) Generation of THP1 cells with the human STING alleles is missing from methods.

      We added the protocol in the methods (lines 380-387). THP-1 KO line stable expressing WT STING was first described by Weikang Tao’s group 10.

      (2) Some abbreviations are not expanded (CDA).

      CDA is expanded as cyclic di-AMP (e.g. line 375).

      References.

      (1) Patel, S. et al. The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele. J Immunol 198, 776-787 (2017).

      (2) Sebastian, M. et al. Obesity and STING1 genotype associate with 23-valent pneumococcal vaccination efficacy. JCI Insight 5 (2020).

      (3) Mansouri, S. et al. MPYS Modulates Fatty Acid Metabolism and Immune Tolerance at Homeostasis Independent of Type I IFNs. J Immunol 209, 2114-2132 (2022).

      (4) Sivick, K. E. et al. Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4183-4185 (2017).

      (5) Gulen, M. F. et al. Signalling strength determines proapoptotic functions of STING. Nat Commun 8, 427 (2017).

      (6) Kabelitz, D. et al. Signal strength of STING activation determines cytokine plasticity and cell death in human monocytes. Sci Rep 12, 17827 (2022).

      (7) Murthy, A. M. V., Robinson, N. & Kumar, S. Crosstalk between cGAS-STING signaling and cell death. Cell Death Differ 27, 2989-3003 (2020).

      (8) Kuhl, N. et al. STING agonism turns human T cells into interferon-producing cells but impedes their functionality. EMBO Rep 24, e55536 (2023).

      (9) Li, C., Liu, J., Hou, W., Kang, R. & Tang, D. STING1 Promotes Ferroptosis Through MFN1/2-Dependent Mitochondrial Fusion. Front Cell Dev Biol 9, 698679 (2021).

      (10) Song, C. et al. SHR1032, a novel STING agonist, stimulates anti-tumor immunity and directly induces AML apoptosis. Sci Rep 12, 8579 (2022).

      (11) Jin, L. et al. Identification and characterization of a loss-of-function human MPYS variant. Genes Immun 12, 263-269 (2011).

      (12) Yi, G. et al. Single nucleotide polymorphisms of human STING can affect innate immune response to cyclic dinucleotides. PLoS One 8, e77846 (2013).

      (13) Patel, S. et al. Response to Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4185-4188 (2017).

      (14) Gao, K. M. et al. Endothelial cell expression of a STING gain-of-function mutation initiates pulmonary lymphocytic infiltration. Cell Rep 43, 114114 (2024).

      (15) Gao, K. M., Motwani, M., Tedder, T., Marshak-Rothstein, A. & Fitzgerald, K. A. Radioresistant cells initiate lymphocyte-dependent lung inflammation and IFNgamma-dependent mortality in STING gain-of-function mice. Proc Natl Acad Sci U S A 119, e2202327119 (2022).

      (16) Hu, W. et al. Regulatory T cells function in established systemic inflammation and reverse fatal autoimmunity. Nat Immunol 22, 1163-1174 (2021).

      (17) Monroe, K. M. et al. IFI16 DNA sensor is required for death of lymphoid CD4 T cells abortively infected with HIV. Science 343, 428-432 (2014).

      (18) Doitsh, G. et al. Cell death by pyroptosis drives CD4 T-cell depletion in HIV-1 infection. Nature 505, 509-514 (2014).

      (19) Jakobsen, M. R., Olagnier, D. & Hiscott, J. Innate immune sensing of HIV-1 infection. Curr Opin HIV AIDS 10, 96-102 (2015).

      (20) Silvin, A. & Manel, N. Innate immune sensing of HIV infection. Curr Opin Immunol 32, 54-60 (2015).

      (21) Altfeld, M. & Gale, M., Jr. Innate immunity against HIV-1 infection. Nat Immunol 16, 554-562 (2015).

      (22) Krapp, C., Jonsson, K. & Jakobsen, M. R. STING dependent sensing - Does HIV actually care? Cytokine Growth Factor Rev 40, 68-76 (2018).

      (23) Liu, Y. et al. Activated STING in a vascular and pulmonary syndrome. N Engl J Med 371, 507-518 (2014).

      (24) Luksch, H. et al. STING-associated lung disease in mice relies on T cells but not type I interferon. J Allergy Clin Immunol 144, 254-266 e258 (2019).

      (25) Stinson, W. A. et al. The IFN-gamma receptor promotes immune dysregulation and disease in STING gain-of-function mice. JCI Insight 7 (2022).

      (26) Warner, J. D. et al. STING-associated vasculopathy develops independently of IRF3 in mice. J Exp Med 214, 3279-3292 (2017).

      (27) Fremond, M. L. et al. Overview of STING-Associated Vasculopathy with Onset in Infancy (SAVI) Among 21 Patients. J Allergy Clin Immunol Pract 9, 803-818 e811 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors provide a comprehensive description of transcriptional regulation in Pseudomonas syringae by investigating the binding characteristics of various transcription factors. They uncover the hierarchical network structure of the transcriptome by identifying top-, middle-, and bottom-level transcription factors that govern the flow of information in the network. Additionally, they assess the functional variability and conservation of transcription factors across different strains of P. syringae by studying DNA-binding characteristics. These findings notably expand our current knowledge of the P. syringae transcriptome.

      The findings associated with crosstalk between transcription factors and pathways, and the diversity of transcription factor functions across strains provide valuable insights into the transcriptional regulatory network of P. syringae. However, these results are at times underwhelming as their significance is unclear. This study would benefit from a discussion of the implications of transcription factor crosstalk on the functioning of the organism as a whole. Additionally, the implications of variability in transcription factor functions on the phenotype of the strains studied would further this analysis.<br /> Overall, this manuscript serves as a key resource for researchers studying the transcriptional regulatory network of P. syringae.

      Thank you for your positive comments.

      Reviewer #2 (Public Review):

      Summary:

      The phytopathogenic bacterium Pseudomonas syringae is comprised of many pathovars with different host plant species and has been used as a model organism to study bacterial pathogenesis in plants. Transcriptional regulation is key to plant infection and adaptation to host environments by this bacterium. However, researchers have focused on a limited number of transcription factors (TFs) that regulate virulence-related pathways. Thus, a comprehensive, systems-level understanding of regulatory interactions between transcription factors in P. syringae has not been achieved.

      This study by Sun et al performed ChIP-seq analysis of 170 out of 301 TFs in P. syringae pv. syringae 1448A and used this unique dataset to infer transcriptional regulatory networks in this bacterium. The network analyses revealed hierarchical interactions between TFs, various network motifs, and co-regulation of target genes by TF pairs, which collectively mediate information flow. As discussed, the structure and properties of the P. syringae transcriptional regulatory networks are somewhat different from those identified in humans, yeast, and E. coli, highlighting the significance of this study. Further, the authors made use of the P. syringae transcriptional regulatory networks to find TFs of unknown functions to be involved in virulence-related pathways. For some of these TFs, their target specificity and biological functions, such as motility and biofilm formation, were experimentally validated. Of particular interest is the finding that despite conservation of TFs between P. syringae pv. syringae 1448A, P. syringae pv. tomato DC3000, P. syringae pv. syringae B728a, and P. syringae pv. actinidiae C48, some of the conserved TFs show different repertoires of target genes in these four P. syringae strains.

      Thank you for your positive comments.

      Strengths:

      This study presents a systems-level analysis of transcriptional regulatory networks in relation to P. syringae virulence and metabolism, and highlights differences in transcriptional regulatory landscapes of conserved TFs between different P. syringae strains, and develops a user-friendly database for mining the ChIP-seq data generated in this study. These findings and resources will be valuable to researchers in the fields of systems biology, bacteriology, and plant-microbe interactions.

      Thank you for your positive comments.

      Weaknesses:

      No major weaknesses were found, but some of the results may need to be interpreted with caution. ChIP-seq was performed with bacterial strains overexpressing TFs. This may cause artificial binding of TFs to promoters which may not occur when TFs are expressed at physiological levels. Another caution is applied to the interpretation of the biological functions of TFs. The biological roles of the tested TFs are based on in vitro experiments. Thus, functional relevance of the tested TFs during plant infection and/or survival under natural environmental conditions remains to be demonstrated.

      Thank you for your comments, and we agree with the reviewer. To eliminate the artificial binding of TFs, we performed EMSA to verify the analyzed targets. Our EMSA results confirmed the analyzed binding peaks.

      For the verification experiments of the biological functions of TFs, we also performed in vivo motility assay and biofilm production assay (Figures 3b-d). To further detect the biological functions of TFs, we performed plant infection assay of TF PSPPH2193 under natural environmental condition (bean leaves). As shown in Figures S6c and g, both the motility and the virulence of P. syringae in ∆PSPPH2193 strain was significantly reduced compared with WT strain. These results showed that TF PSPPH2193 positively regulated the pathogenicity of P. syringae via modulating the bacterial motility.

      Reviewer #3 (Public Review):

      Summary:

      This study aims to understand gene regulation of the plant bacterial pathogen Pseudomonas syringae. Although the function of some TFs has been characterized in this strain, a global picture of the gene regulatory network remains elusive. The authors conducted a large-scale ChIP-seq analysis, covering 170 out of 301 TFs of this strain, and revealed gene regulatory hierarchy with functional validation of some previously uncharacterized TFs.

      Thank you for your positive comments.

      Strengths:

      - This study provides one of the largest ChIP-seq datasets for a single bacterial strain, covering more than half of its TFs. This impressive resource enabled comprehensive systems-level analysis of the TF hierarchy.

      - This study identified novel gene regulation and function with validations through biochemical and genetic experiments.

      - The authors attempted on broad analyses including comparisons between different bacterial strains, providing further insights into the diversity and conservation of gene regulatory mechanisms.

      Thank you for your positive comments.

      Weaknesses:

      (1) Some conclusions are not backed by quantitative or statistical analyses, and they are sometimes overinterpreted.

      Thank you for your comments. We used hypergeometric test in this analysis. Although only one gene was enriched in some pathways, the adjusted p-value was less than 0.05. We added the details in the revised manuscript.

      (2) Some figures and analyses are not well explained, and I was not able to understand them.

      Thank you for your comments, and we are sorry for the confusion. We defined ‘indirect interaction’ as ‘co-association’ and ‘cooperativity’ as ‘if the common target of two TFs is from a TF’. We added the definition of "indirect interaction" and "cooperativity" in the revised legend.

      For Figure S3a, the low co-association scores and large peak numbers of these top-level TFs indicated that top-level TFs preferred to solely regulate target genes, but not to co-regulate with other top-level TFs. PSPPH4700 was an example to show that top-level TFs with low co-association scores and large peak numbers tend to solely regulate target genes, but not to co-regulate with other top-level TFs. We revised the sentence to ‘For example, the top-level TF PSPPH4700 yielded over 1,700 peaks but cooperated with only 24 top-level TFs with low co-association scores about 0.05 (Supplementary Table 2b).’.

      We analyzed high co-association scores of 125 TFs in three levels and further determined the co-association patterns. To identify the tendency of co-association of all these 125 TFs, the co-association patterns were classified into 4 clusters. Bottom-level TFs tend to co-regulate target genes with other TFs. We revised the sentence in the revised manuscript.

      For Figure 2b, in C1, C2 and C4, many bottom-level TFs performed co-association pattern with other TFs, especially bottom TFs (showed in C4). To explore the regulatory pattern in C3, the peak locations in target genes of MexT were analyzed with those of TFs in C3. Seven top-level TFs (PSPPH1435, PSPPH1758, PSPPH2193, PSPPH2454, PSPPH4638, PSPPH4998 and PSPPH3411), three middle-level TFs (PSPPH1100, PSPPH5132 and PSPPH5144) and four bottom-level TFs (PSPPH0700, PSPPH2300, PSPPH2444 and PSPPH2580) were compared with MexT. MexT showed higher co-association scores (more than 60 scores) with more top-level-TFs. Therefore, we demonstrated that MexT performed closer co-association relationships with top-level TFs. We added the statement in the revised manuscript.

      For Figure 1a, the hierarchical network showed different number of TFs in three levels (54 top-level TFs, 62 middle-level TFs and 147 bottom-level TFs), which indicated that more than half of TFs (bottom-level TFs) tend to be regulated by other TFs and then directly bound to target genes. This finding showed a downward regulatory direction of transcription regulation in P. syringae. We revised the statement in the revised manuscript.

      (3) The Method section lacks depth, especially in data analyses. It is strongly recommended that the authors share their analysis codes so that others can reproduce the analyses.

      Thank you for your comments, and we defined the intergenic region before each TF sequence as the promoter region. As pHM1 plasmid carries its own constitutive promoter (lacZ promoter), we amplified the TF-coding sequence and cloned into site following the promoter. The TF protein expression was activated by the promoter of plasmid. Psph 1448A was used for our main ChIP-seq. We added the details in the revised manuscript.

      For Figure S3, we performed GO analysis on genes that were co-bound by TF pairs. We added the details in the revised manuscript.

      We shared our analysis codes on the website (https://github.com/dengxinb2315/PS-PATRnet-code) in the Data Availability.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      (1) The specific strain of Pseudomonas syringae used in the study outside of the evolutionary analysis should be specified in the abstract and main text.

      Thank you for your suggestion. We revised the statements in abstract and main text to specific strains.

      (2) The language used throughout the manuscript should be revised for clarity, conciseness, and readability.

      Thank you for your suggestion. We have revised the language used throughput the manuscript by a scientific editor who is a native speaker of English.

      (2) Line 688: Replace "80C" with "-80C".

      Thank you for your correction. We revised ‘80℃’ to ‘-80℃’. Please see Line 713.

      (3) Line 172 - 173: The abbreviations TT, MM, BB, TM, TB, and MB need to be expanded in the main text before their use.

      Thank you for your suggestion. We added the abbreviations TT, MM, BB, TM, TB, and MB in the manuscript. Please see Lines 172-174.

      Reviewer #2 (Recommendations For The Authors):

      Major points

      (1) The name of the P. syringae strains used in each experiment/analysis should be explicitly stated (most experiments were carried out with P. syringae strain 1448A). This should also be applied to the introduction where many papers on P. syringae are cited without clear indication of strain names. I think this amendment is essential because target genes and thus biological functions of TFs could be different between P. syringae strains, as shown in the present study.

      Thank you for your suggestion. We revised the P. syringae strains in the citations throughout the manuscript.

      (2) How many TFs were analyzed throughout the study? Most sentences including line 22 in the abstract say 170, but I also found some say 270 (for example, line 106 and line 149). The legend of Figure 1 says 262. More detailed information is required regarding the datasets used for each analysis.

      Thank you for your suggestion. The number of TFs analyzed by ChIP-seq in this research is 170, the number of TFs analyzed by HT-SELEX in our previous research is 100. Hierarchical analysis integrated data from ChIP-seq and HT-SELEX which included 270 TFs. As 8 TFs did not show hierarchical characteristic, the legend of Figure 1 said 262 TFs. We added the data source in the revised manuscript. Please see Lines 104, 147, 160 and 1082.

      (3) Figure 1b: Please define "indirect interaction" and "cooperativity" in the legend as well as in the text. I only found the definition of "direct interaction".

      Sorry for the missing information. We defined ‘indirect interaction’ and ‘cooperativity’ as ‘co-association’ and ‘if the common target of two TFs is from a TF’, respectively. We added the definition of "indirect interaction" and "cooperativity" in the revised legend. Please see Lines 174-176, 1084-1086.

      (4) I found it very interesting that conserved TFs show different repertoires of target genes in different P. syringae strains. This suggests the rewiring of transcriptional regulatory networks in P. syringae strains, but the underlying mechanism is not explored in the current manuscript. It can be easily tested whether these conserved TFs bind to similar or different motifs by motif enrichment analysis. If they bind to similar motifs, it is possible that the promoter sequences of their target genes have diversified. Addressing or at least discussing these points would provide molecular insights into the diversification of the transcriptional regulatory networks in P. syringae. Similarly, functional enrichment analysis of target genes can be used to test whether the conserved TFs regulate different biological processes.

      Thank you for your suggestion. We added the motif analysis and functional enrichment analysis of target genes of TFs (PSPPH3122 and PSPPH4127) in different P. syringae strains. We found two different motifs (AGACN4GATCAA and CGGACGN3GATCA) in 1448A and DC3000 strains, respectively. We also performed the GO analysis and found the specific functions of PSPPH3122 in Psph 1448A compared with Pst DC3000 and Pss B728a strains, including recombinase activity and DNA recombination. For PSPPH4127, we found four different motifs in four P. syringae strains. GO analysis showed its relationship with recombinase activity in Psph 1448A strain, and RNA binding, structural constituent of ribosome, translation and ribosome in Pss B728a strain. These results indicated the highly functional diversity of TFs in P. syringae. We added these points in the Results part, and Figure S9-S10 in the revised manuscript. Please see Lines 497-509.

      (5) Related to point 4, it would be quite useful if a list of orthologous genes of 1448A TFs in the other tested P. syringae strains were provided. Such information may also enhance the utility of the database developed in this study.

      Thank you for your suggestion. We added the list of orthologous genes of 301 Psph 1448A TFs in the other tested P. syringae strains in the Supplementary Table 5. Please see Lines 467 and Supplementary Table 5.

      (6) Lines 243-246: It is unclear how these functional enrichment analyses were performed. Did you use target genes regulated by individual TFs or those coregulated by pairs of TFs? Please add more information for the sake of readers.

      Thank you for your suggestion. We performed the functional enrichment analyses by hypergeometric test (BH-adjusted p < 0.05) via using target genes regulated by individual TFs. We added the details in the Results part. Please see Lines 248-252, 270, 1194-1195, 1199-1200 and 1205-1206.

      Minor points

      (1) Lines 167-168: I may not understand correctly, but you might want to say "downward-pointing edges" instead of "upward-pointing edges".

      Thank you for correction. We revised the ‘upward-pointing edges’ to ‘downward-pointing edges’. Please see Line 166.

      (2) Line 174: "physical interactions" should be amended to "direct interactions".

      Thank you for correction. We revised the ‘physical interactions’ to ‘direct interactions’. Please see Line 177.

      (3) Line 224: Could you please explain why bacterial growth in plant tissues is considered an example of "multi-stability"?

      Thank you for your suggestion. We are sorry for the incorrect statement. We showed ‘plant intercellular spaces’ as ‘multi-stability’. We revised the sentence to ‘These auto-regulators are important and always act as repressors in scenarios of multi-stability, such as plant intercellular spaces’. Please see Lines 224-226.

      (4) Line 254-257: Here, the definition of "tether binding" is introduced, but it is not very clear to me. In my understanding, tethered binding is an indirect binding of a TF to a target gene through protein-protein interaction with other TF that directly binds to the promoter of the target gene.

      Thank you for your suggestion, and we agree with you. We referred to the paper published in 2012 (Wang et al., 2012) and revised the statement of ‘tether binding’ to ‘This finding suggested that these TFs indirectly regulated target genes through protein-protein interaction with other TFs that directly binds to the promoters of target genes, a phenomenon defined as tethered binding’. Please see Lines 259-262.

      (5) Lines 341-343: Figure 3b shows qRT-PCR of hopAE1, not hrpR.

      Thank you for your correction. We revised ‘hrpR’ to ‘hopAE1’. Please see Line 349.

      (6) Lines 500 and Figure 6b: It is hard to see edges from module 12 to others. So, it would be better to provide numeric information (number of TFs and target genes) in the text.

      Thank you for your suggestion. Module 12 includes 22 TFs and 318 target genes. We added the statement of numeric information about Module 12 in the revised manuscript. Please see Lines 536-537.

      (7) Line 519: Figure S4b is not the EMSA data for PSPPH3798. Should it be Figure S4e?

      Thank you for your correction. We revised to ‘Figure S4e’. Please see Line 545.

      (8) Line 522: Figure S6b is not relevant to the statement here.

      Thank you for your correction. We deleted the ‘Figure S6b’ here. Please see Line 547.

      (9) Line 593: prokaryotic transcriptional regulatory networks -> eukaryotic transcriptional regulatory networks?

      Thank you for your correction. We revised ‘prokaryotic transcriptional regulatory networks’ to ‘eukaryotic transcriptional regulatory networks’. Please see Line 618.

      (10) Figure S3 requires images of higher resolution. Especially, values for the color codes are not readable or very hard to see.

      Thank you for your suggestion. To make the images clearer, we enlarged the images, change the color codes, and divided it into three figures. Please see the revised Figures S3-S5 and corresponding Figure legends at Lines 1191-1206.

      Reviewer #3 (Recommendations For The Authors):<br /> (1) Some conclusions are not backed by quantitative or statistical analyses, and they are sometimes overinterpreted.

      L221: "Taken together, the simplest and most effective submodule M1 and the coregulatory submodule M13 played crucial roles in the transcriptional regulation of TFs in P. syringae."

      The authors did not provide any evidence supporting the functional importance of any of these submodules. M13 is most enriched within the locked loop, but its size is much smaller than simple loops. What evidence supports the importance of this particular submodule?

      Thank you for your suggestion. In eukaryote (Saccharomyces cerevisiae) and prokaryote (Escherichia coli) which have the best characterized transcriptional regulation networks, the feed-forward loop (called M13 in this article) appear numerous times in the networks and perform different biological functions. M1 appeared most frequently by an order of magnitude than other modules. We revised the sentence to ‘Taken together, the most numerous but simplest submodule M1 played a crucial role in the transcriptional regulation of TFs in P. syringae.’ Please see Lines 222-224.

      L223: "...we found 92 auto-regulators...These auto-regulators are important and always act as repressors in scenarios of multi-stability, such as in plant intercellular spaces where bacteria grow (Figure 1d)(Alon, 2007). These regulators are regarded as bistable switches that further influence the expression of downstream genes."<br /> Are these claims supported by any evidence?

      Thank you for your suggestion. We referred to the following articles:

      (1) Alon. Nature Reviews Genetics. 2007(Alon, 2007).

      That transcription factors repress the transcription of their target genes was considered as negative regulation. These negative autoregulators account for half of the repressors in E. coli and occur in many eukaryotes. The repressors controlled the concentration of the target production through suppressing its expression, which accelerated back to the steady state of cells.

      (2) Becskei. et al. Nature. 2000; Rosenfeld et al. Journal of Molecular Biology. 2002 (Becskei & Serrano, 2000; Rosenfeld, Elowitz, & Alon, 2002).

      Fluorescent assay confirmed that the negative autoregulatory module (negative autoregulator TetR) spent less time to the log phase than unregulated group, which reduced cell-to-cell fluctuations in the steady-state level of the transcription factor. Some negative autoregulators were showed here, such as LexA, CysB and SrlA-D.

      In our research, we also identified many autoregulators including CysB and LexA2 (annotated as LexA repressor). We revised the sentence to ‘In addition, we found 92 auto-regulators in our hierarchy network. These auto-regulators are important and always act as repressors in scenarios of multi-stability, such as plant intercellular spaces (Figure 1d) (Alon, 2007). For example, LexA and CysB as negative autoregulators were indicated to reduce cell-to-cell fluctuations in the steady-state level of the transcription factor (Becskei & Serrano, 2000; Rosenfeld et al. 2002).’. Please see Lines 224-229.

      L265: "This finding indicated that the bottom-level TFs, which were more easily regulated, tended to cooperate with downstream genes and other intra-level TFs."<br /> Could the authors provide more explanation to reach this conclusion from the data? Analyzing the number of highly co-accessing TFs does not sufficiently support this conclusion. The clustering of TFs (C1-C4) is incomplete, and each TF level (Top/Middle/Bottom) contains different numbers of TFs. Since the authors calculated all-by-all co-association scores for these 125 TFs, they can group these scores into 6 possible combinations (TT, TM, TB, MM, MB, BB) and show the distribution of co-association scores.

      Thank you for your suggestion. We indicated that the bottom-level TFs preferred to regulate the target genes through the cooperation with other TFs. To further support the claim, we analyzed the proportion of the bottom TF interaction in all the TF pairs interactions and direct interaction based on results in Figure 1B. The interactions of bottom TFs were 43% and 49%, respectively. However, the interactions of top TFs and middle TFs were only 20% and 28%, respectively. We revised the statement ‘Based on the analysis in Figure 1B, we found that the proportions of bottom-level TF interaction in all the TF pair interactions and direct interaction were 43% and 49%. These results indicated that the bottom-level TFs tended to regulate downstream genes through cooperating with other level TFs.’ in the revised manuscript. Please see Lines 269-272.

      As not every TF performed co-association with other TFs, we only collected 125 TFs with co-association scores. For the numbers of TF in each level, we divided TFs into three levels according to hierarchy height. Hierarchy height from -1 to -0.3 represented bottom level; hierarchy height from -0.3 to 0.3 represented middle level ; hierarchy height from 0.3 to 1 represents top level. Each level was equally divided by height scores. We suggested that different numbers of TFs in three levels indicated the characteristic of transcriptional regulation in P. syringae.

      Thank you for your suggestion. As the co-association patterns were determined by co-association scores of the same TFs, we first grouped the co-association scores into 3 possible TF pairs (TT, MM, and BB, in Figures S3a, S4a and S5a). Our results indicated that higher co-association scores preferred to occur in bottom-level TFs. We revised the statement in the revised manuscript. Please see Lines 244-252.

      (2) Some figures and analyses are not well explained, and I was not able to understand them.

      Figure 1b: The terms "direct," "indirect," and "cooperativity" require further clarification as their definitions in the text (L169-183) are unclear to me. This ambiguity hampers the evaluation of the authors' discussion regarding TF-TF interactions (L561-584), an important theme of this study. The figure includes concepts discussed in later sections (e.g., cooperativity), making it difficult to understand. A diagram explaining these concepts would be highly helpful for readers to understand.

      Sorry for the missing information. We defined ‘indirect interaction’ as ‘co-association’, ‘cooperativity’ as ‘if the common target of two TFs is from a TF’. We added the definition of "indirect interaction" and "cooperativity" in the revised manuscript and legend. Please see Lines 174-176 and 1085-1087.

      L253: "Notably, we found that TFs at the top level, without cooperating TFs, exhibited a large number of binding peaks (Figure S3a)."

      I could not understand this sentence. Did the authors mean that top-level TFs with a large number of peaks showed a low level of co-association? If so, does this data suggest that these TFs do not tend to cooperate with other TFs? I was confused by the discussion in L253-L261.

      Thank you for your comment, and we agree with you. The low co-association scores and large peak numbers of these top-level TFs indicated that top-level TFs preferred to solely regulate target genes, but not to co-regulate with other top-level TFs.

      Thank you for your comment. From L253-256, PSPPH4700 was an example to show that top-level TFs with low co-association scores and large peak numbers tend to solely regulate target genes, but not to co-regulate with other top-level TFs. We revised the sentence to ‘For example, the top-level TF PSPPH4700 yielded over 1,700 peaks, but cooperated with only 24 top-level TFs with low co-association scores about 0.05 (Supplementary Table 2b).’.

      From L257-261, we analyzed high co-association scores of 125 TFs in three levels and further determined the co-association patterns. To identify the tendency of co-association of all these 125 TFs, the co-association patterns were classified into 4 clusters. Bottom-level TFs tend to co-regulate target genes with other TFs. We revised the sentence. Please see Lines 262-264, 265-266 and 269-272.

      L287: "The analysis of the peak locations of MexT demonstrated that MexT showed closer co-association relationships with top-level TFs (Figure 2b)."

      I could reach this conclusion by seeing Figure 2b. Additional explanation and/or data visualization would be appreciated.

      Thank you for your suggestion. In C1, C2 and C4, many bottom-level TFs performed co-association pattern with other TFs, especially bottom TFs (showed in C4). To explore the regulatory pattern in C3, the peak locations in target genes of MexT were analyzed with those of TFs in C3. Seven top-level TFs (PSPPH1435, PSPPH1758, PSPPH2193, PSPPH2454, PSPPH4638, PSPPH4998 and PSPPH3411), three middle-level TFs (PSPPH1100, PSPPH5132 and PSPPH5144) and four bottom-level TFs (PSPPH0700, PSPPH2300, PSPPH2444 and PSPPH2580) were compared with MexT. MexT showed higher co-association scores (more than 60 scores) with more top-level-TFs. Therefore, we demonstrated that MexT performed closer co-association relationships with top-level TFs. We added the statement in the revised manuscript. Please see Lines 291-296.

      Figure 6cd: What kind of enrichment analysis did the authors perform? Was any statistical test used? The figure only shows the number of genes, and sometimes the number is only 1 for a functional category. Can it be considered as significant enrichment?

      Thank you for your comment. We used hypergeometric test in this analysis. Although only one gene was enriched in some pathways, the adjusted p-value was less than 0.05. We added the details in the revised manuscript. Please see Lines 533-534.

      L169: "The hierarchical network revealed a downward information flow, suggesting the prioritization of collaboration between different hierarchy levels."<br /> Can the authors please explain the logic behind this statement more in detail?

      Thank you for your comment. The hierarchical network showed different number of TFs in three levels (54 top-level TFs, 62 middle-level TFs and 147 bottom-level TFs), which indicated that more than half of TFs (bottom-level TFs) tend to be regulated by other TFs and then directly bound to target genes. This finding showed a downward regulatory direction of transcription regulation in P. syringae. We revised the statement in the revised manuscript. Please see Lines 167-170.

      (3) The Method section lacks depth, especially on data analyses.

      How did the authors define promoter regions of each gene? How were operons treated in their analyses? Was P. syringae 1448A used for their main ChIP-seq?

      Thank you for your comment. We defined the intergenic region before each TF sequence as the promoter region.

      As pHM1 plasmid carries its own constitutive promoter (lacZ promoter), we amplified the TF-coding sequence and cloned into the site following the promoter. The TF protein expression was activated by the promoter of plasmid.

      P. syringae 1448A was used for our main ChIP-seq. We added the details in the revised manuscript. Please see Lines 705 and 727-730.

      Figure S3: I am not sure how the GO analyses were done. For example, in the case of the top-level TF PSPPH4700, did the authors perform GO analysis on genes that are co-bound by PSPPH4700 and any other top-level TFs?

      Thank you for your comment and we agree with you. We performed GO analysis on genes that were co-bound by TF pairs in the same level. We added the details in the revised manuscript. Please see Lines 248-252.

      The analysis presented in Figure 6a needs more explanation of the methodology employed by the authors.

      Thank you for your comment. We added more details for the analysis in Figure 6a. Please see Lines 514-522.

      It is strongly recommended that the authors share their analysis codes so that others can reproduce the analyses.

      Thank you for your comment. We shared our analysis codes on the website (https://github.com/dengxinb2315/PS-PATRnet-code) in the Data Availability. Please see Lines 800-801.

      (4) Other:

      Figure 3: I suggest putting additional panel labels to facilitate the interpretation of the figure.

      Thank you for your suggestion. We added detailed labels in the revised Figures 3 and 4. Please see in the revised Figures 3 and 4.

      I spotted several potential errors:

      L106: 170 TFs?

      Thank you for your comment, and we are sorry for the missing details. For the hierarchical network, we integrated the DNA-binding data of 170 TFs in this study and 100 TFs in our previous SELEX research. We added the details in the revised manuscript. Please see Lines 104, 147 and 159-160.

      L592: P. syringae not E. coli?

      Thank you for your comment. Here we discussed the hierarchical characteristics in E. coli. We revised the statement in the revised manuscript. Please see Line 618.

      L593: eukaryotic not prokaryotic?

      Thank you for your correction. Here we discussed the feedforward loops in our study. We revised the statement in the revised manuscript. Please see Line 618.

      References

      Alon, U. (2007). Network motifs: theory and experimental approaches. Nature Reviews Genetics, 8(6), 450-461.

      Becskei, A., & Serrano, L. (2000). Engineering stability in gene networks by autoregulation. Nature, 405(6786), 590-593.

      Rosenfeld, N., Elowitz, M. B., & Alon, U. (2002). Negative autoregulation speeds the response times of transcription networks. Journal of molecular biology, 323(5), 785-793.

      Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T. W., Greven, M. C., . . . Cheng, Y. (2012). Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome research, 22(9), 1798-1812.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      The iron manipulation experiments are in the whole animal and it is likely that this affects general feeding behaviour, which is known to affect NB exit from quiescence and proliferative capacity. The loss of ferritin in the gut and iron chelators enhancing the NB phenotype are used as evidence that glia provide iron to NB to support their number and proliferation. Since the loss of NB is a phenotype that could result from many possible underlying causes (including low nutrition), this specific conclusion is one of many possibilities.

      We have investigated the feeding behavior of fly by Brilliant Blue (sigma, 861146)[1]. Our result showed that the amount of dye in the fly body were similar between control group and BPS group, suggesting that BPS almost did not affect the feeding behavior (Figure 3—figure supplement 1A).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There was a gap between the Pros nuclear localization and downstream targets of ferritin, particularly NADH dehydrogenase and biosynthesis. Could overexpression of Ndi1 restore Pros localization in NBs?

      Ferritin defect downregulates iron level, which leads to cell cycle arrest of NBs via ATP shortage. And cell cycle arrest of NBs probably results in NB differentiation[2, 3]. We have added the experiment in Figure 5—figure supplement 2. This result showed that overexpression of Ndi1 could significantly restore Pros localization in NBs.

      The abstract requires revision to cover the major findings of the manuscript, particularly the second half.

      We revised the abstract to add more major findings of the manuscript in the second half as follows:

      “Abstract

      Stem cell niche is critical for regulating the behavior of stem cells. Drosophila neural stem cells (Neuroblasts, NBs) are encased by glial niche cells closely, but it still remains unclear whether glial niche cells can regulate the self-renewal and differentiation of NBs. Here we show that ferritin produced by glia, cooperates with Zip13 to transport iron into NBs for the energy production, which is essential to the self-renewal and proliferation of NBs. The knockdown of glial ferritin encoding genes causes energy shortage in NBs via downregulating aconitase activity and NAD+ level, which leads to the low proliferation and premature differentiation of NBs mediated by Prospero entering nuclei. More importantly, ferritin is a potential target for tumor suppression. In addition, the level of glial ferritin production is affected by the status of NBs, establishing a bicellular iron homeostasis. In this study, we demonstrate that glial cells are indispensable to maintain the self-renewal of NBs, unveiling a novel role of the NB glial niche during brain development.”

      In Figure 2B Mira appeared to be nuclear in NBs, which is inconsistent with its normal localization. Was it Dpn by mistake?

      In Figure 2B, we confirmed that it is Mira. Moreover, we also provide a magnified picture in Figure 2B’, showing that the Mira mainly localizes to the cortex or in the cytoplasm as previously reported.

      Figure 2C, Fer1HCH-GFP/mCherry localization was non-uniform in the NBs revealing 1-2 regions devoid of protein localization potentially corresponding to the nucleus and Mira crescent enrichment. It is important to co-label the nucleus in these cells and discuss the intracellular localization pattern of Ferritin.

      We have revised the picture with nuclear marker DAPI in Figure 2C. The result showed that Fer1HCH-GFP/Fer2LCH-mCherry was not co-localized with DAPI, which indicated that Drosophila ferritin predominantly distributes in the cytosol[4, 5]. As for the concern mentioned by this reviewer, GFP/mCherry signal in NBs was from glial overexpressed ferritin, which probably resulted in non-uniform signal.

      In Figure 3-figure supplement 3F, glial cells in Fer1HCH RNAi appeared to be smaller in size. This should be quantified. Given the significance of ferritin in cortex glial cells, examining the morphology of cortex glial cells is essential.

      In Figure 3—figure supplement 3F, we did not label single glial cells so it was difficult to determine whether the size was changed. However, it seems that the chamber formed by the cellular processes of glial cells becomes smaller in Fer1HCH RNAi. The glial chamber will undergo remodeling during neurogenesis, which responses to NB signal to enclose the NB and its progeny[6]. Thus, the size of glial chamber is regulated by NB lineage size. In our study, ferritin defect leads to the low proliferation, inducing the smaller lineage of each NB, which likely makes the chamber smaller.

      Since the authors showed that the reduced NB number was not due to apoptosis, a time-course experiment for glial ferritin KD is recommended to identify the earliest stage when the phenotype in NB number /proliferation manifests during larval brain development.

      We observed brains at different larval stages upon glial ferritin KD. The result showed that NB proliferation decreased significantly, but NB number declined slightly at the second-instar larval stage (Figure 5—figure supplement 1E and F), suggesting that brain defect of glial ferritin KD manifests at the second-instar larval stage.

      Transcriptome analysis on ferritin glial KD identified genes in mitochondrial functions, while the in vivo EM data suggested no defects in mitochondria morphology. A short discussion on the inconsistency is required.

      For the observation of mitochondria morphology via the in vivo EM data, we focused on visible cristae in mitochondria, which was used to determine whether the ferroptosis happens[7]. It is possible that other details of mitochondria morphology were changed, but we did not focus on that. To describe this result more accurately, we replaced “However, our observation revealed no discernible defects in the mitochondria of NBs after glial ferritin knockdown” with the “However, our result showed that the mitochondrial double membrane and cristae were clearly visible whether in the control group or glial ferritin knockdown group, which suggested that ferroptosis was not the main cause of NB loss upon glial ferritin knockdown” in line 207-209.

      The statement “we found no obvious defects of brain at the first-instar larval stage (0-4 hours after larval hatching) when knocking down glial ferritin (Figure 5-figure supplement 1C).” lacks quantification of NB number and proliferation, making it challenging to conclude.

      We have provided the quantification of NB number and proliferation rate of the first-instar larval stage in Figure 5—figure supplement 1C and D. The data showed that there is no significant change in NB number and proliferation rate when knocking down ferritin, suggesting that no brain defect manifests at the first-instar larval stage.

      A wild-type control is necessary for Figure 6A-C as a reference for normal brain sizes.

      We have added Insc>mCherry RNAi as a reference in Figure 6A-D, which showed that the brain size of tumor model is larger than normal brain. Moreover, we removed brat RNAi data from Figure 6A-D to Figure 6—figure supplement 1A-D for the better layout.

      In Figures 6B, D, “Tumor size” should be corrected to “Larval brain volume”.

      Here, we measured the brain area to assess the severity of the tumor via ImageJ instead of 3D data of the brain volume. So we think it would be more appropriate to use the “Larval brain size” than “Larval brain volume” here. Thus, we have corrected “Tumor size” to “Larval brain size” in Figure 6B and D to Figure 6—figure supplement 1B and D.

      Considering that asymmetric division defects in NBs may lead to premature differentiation, it is advisable to explore the potential involvement of ferritin in asymmetric division.

      aPKC is a classic marker to determine the asymmetric division defect of NB. We performed the aPKC staining and found it displayed a crescent at the apical cortex based on the daughter cell position whether in control or glial ferritin knockdown (Figure 5—figure supplement 3A). This result indicated that there was no obvious asymmetric defect after glial ferritin knockdown.

      In the statement "Secondly, we examined the apoptosis in glial cells via Caspase-3 or TUNEL staining, and found the apoptotic signal remained unchanged after glial ferritin knockdown (Figure 3-figure supplement 3A-D).", replace "the apoptosis in glial cells" with "the apoptosis in larval brain cells".

      We have replaced "the apoptosis in glial cells" with "the apoptosis in larval brain cells" in line 216.

      Include a discussion on the involvement of ferritin in mammalian brain development and address the limitations associated with considering ferritin as a potential target for tumor suppression.

      We have added the discussion about ferritin in mammalian brain development in line 428-430 and limitation of ferritin for suppressing tumor in line 441-444.

      Indicate Insc-GAL4 as BDSC#8751, even if obtained from another source. Additionally, provide information on the extensively used DeRed fly stock used in this study within the methods section.

      We provided the stock information of Insc-GAL4 and DsRed in line 673-674.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      The number of NBs differs a lot between experiments. For example, in Fig 1B and 1K controls present less than 100 NBs whereas in Figure 1 Supplementary 2B it can be seen that controls have more than 150. Then, depending on which control you compare the number of NBs in flies silencing Fer1HCH or Fer2LCH, the results might change. The authors should explain this.

      Figure 1 Supplementary 2B (Figure 1 Supplementary 3B in the revised version) shows NB number in VNC region while Fig 1B and 1K show NB number in CB region. At first, we described the general phenotype showing the NB number in CB and VNC respectively (Fig 1 and Fig 1-Supplementary 1 and 3 in the revised version). And the NB number is consistent in each region. After then, we focused on NB number in CB for the convenience.

      This reviewer encourages the authors to use better Gal4 lines to describe the expression patterns of ferritins and Zip13 in the developing brain. On the one hand, the authors do not state which lines they are using (including supplementary table). On the other hand, new Trojan GAL4 (or at least InSite GAL4) lines are a much better tool than classic enhancer trap lines. The authors should perform this experiment.

      All stock source and number were documented in Table 2. Ferritin GAL4 and Zip13 GAL4 in this study are InSite GAL4. In addition, we also used another Fer2LCH enhancer trapped GAL4 to verify our result (DGRC104255) and provided the result in Figure 2—figure supplement 1. Our data showed that DsRed driven by Fer2LCH-GAL4 was co-localized with the glia nuclear protein Repo, instead of the NB nuclear protein Dpn, which was consistent with the result of Fer1HCH/Fer2LCH GAL4. In addition, we will try to obtain the Trojan GAL4 (Fer1HCH/Fer2LCH GAL4 and Zip13 GAL4) and validate this result in the future.

      The authors exclude very rapidly the possibility of ferroptosis based only on some mitochondrial morphological features without analysing the other hallmarks of this iron-driven cell death. The authors should at least measure Lipid Peroxidation levels in their experimental scenario either by a kit to quantify by-products of lipid peroxidation such as Malonaldehide (MDA) or using an anti 4-HNE antibody.

      We combined multiple experiments to exclude the possibility of ferroptosis. Firstly, ferroptosis can be terminated by iron chelator. And we fed fly with iron chelator upon glial ferritin knockdown, but NB number and proliferation were not restored, which suggested that ferroptosis probably was not the cause of NB loss induced by glial ferritin knockdown (Figure 3B and C). Secondly, Zip13 transports iron into the secretary pathway and further out of the cells in Drosophila gut[8]. Our data showed that knocking down iron transporter Zip13 in glia resulted in the decline of NB number and proliferation, which was consistent with the phenotype upon glial ferritin knockdown (Figure 3E-G). More importantly, the knockdown of Zip13 and ferritin simultaneously aggravated the phenotype in NB number and proliferation (Figure 3E-G). These results suggested that the phenotype was induced by iron deficiency in NB, which excluded the possibility of iron overload or ferroptosis to be the main cause of NB loss upon glial ferritin knockdown. Finally, we observed mitochondrial morphology on double membrane and the cristae that are critical hallmarks of ferroptosis, but found no significant damage (Figure 3-figure supplement 2E and F).

      In addition, we have added the 4-HNE determination in Figure 3—figure supplement 2G and H. This result showed that 4-HNE level did not change significantly, suggesting that lipid peroxidation was stable, which supported to exclude the possibility that the ferroptosis led to the NB loss upon glial ferritin knockdown.

      All of the above results together indicate that ferroptosis is not the cause of NB loss after ferritin knockdown.

      A major flaw of the manuscript is related to the chapter Glial ferritin defects result in impaired Fe-S cluster activity and ATP production and the results displayed in Figure 4. The authors talk about the importance of FeS clusters for energy production in the mitochondria. Surprisingly, the authors do not analyse the genes involved in this process such as but they present the interaction with the cytosolic FeS machinery that has a role in some extramitochondrial proteins but no role in the synthesis of FeS clusters incorporated in the enzymes of the TCA cycle and the respiratory chain. The authors should repeat the experiments incorporating the genes NSF1 (CG12264), ISCU(CG9836), ISD11 (CG3717), and fh (CG8971) or remove (or at least rewrite) this entire section.

      Thanks for this constructive advice and we have revised this in Figure 4B and C. We repeated the experiment with blocking mitochondrial Fe-S cluster biosynthesis by knocking down Nfs1 (CG12264), ISCU(CG9836), ISD11 (CG3717), and fh (CG8971), respectively. Nfs1 knockdown in NB led to a low proliferation, which was consistent with CIA knockdown. However, we did not observe the obvious brain defect in ISCU(CG9836), ISD11 (CG3717), and fh (CG8971) knockdown in NB. Our interpretation of these results is that Nfs1 probably is a necessary core component in Fe-S cluster assembly while others are dispensable[9].

      The presence and aim of the mouse model Is unclear to this reviewer. On the one hand, It Is not used to corroborate the fly findings regarding iron needs from neuroblasts. On the other hand, and without further explanation, authors migrate from a fly tumor model based on modifying all neuroblasts to a mammalian model based exclusively on a glioma. The authors should clarify those issues.

      Although iron transporter probably is different in Drosophila and mammal, iron function is conserved as an essential nutrient for cell growth and proliferation from Drosophila to mammal. The data of fly suggested that iron is critical for brain tumor growth and thus we verified this in mammalian model. Glioma is the most common form of central nervous system neoplasm that originates from neuroglial stem or progenitor cells[10]. Therefore, we validated the effect of iron chelator DFP on glioma in mice and found that DFP could suppress the glioma growth and further prolong the survival of tumor-bearing mice.

      Minor points

      Although referred to adult flies, the authors did not include either in the introduction or in the discussion existing literature about expression of ferritins in glia or alterations of iron metabolism in fly glia cells (PMID: 21440626 and 25841783, respectively) or usage of the iron chelator DFP in drosophila (PMID: 23542074). The author should check these manuscripts and consider the possibility of incorporating them into their manuscript.

      Thanks for your remind. We have incorporated all recommended papers into our manuscript line 65-67 and 168.

      The number of experiments in each figure is missing.

      All experiments were repeated at least three times. And we revised this in Quantifications and Statistical Analysis of Materials and methods.

      If graphs are expressed as mean +/- sem, it is difficult to understand the significance stated by the authors in Figure 2E.

      We apologize for this mistake and have revised this in Quantifications and Statistical Analysis. All statistical results were presented as means ± SD.

      When authors measure aconitase activity, are they measuring all (cytosolic and mitochondrial) or only one of them? This is important to better understand the experiments done by the authors to describe any mitochondrial contribution (see above in major points).

      In this experiment, we were measuring the total aconitase activity. We also tried to determine mitochondrial aconitase but it failed, which was possibly ascribed to low biomass of tissue sample.

      In this line, why do controls in aconitase and atp lack an error bar? Are the statistical tests applied the correct ones? It is not the same to have paired or unpaired observations.

      It is the normalization. We repeated these experiments at least three times in different weeks respectively, because the whole process was time-consuming and energy-consuming including the collection of brains, protein determination and ATP or aconitase determination. And the efficiency of aconitase or ATP kit changed with time. We cannot control the experiment condition identically in different batches. Therefore, we performed normalization every time to present the more accurate result. The control group was normalized as 1 via dividing into itself and other groups were divided by the control. This normalized process was repeated three times. Therefore, there is no error bar in the control group. We think it is appropriate to apply ANOVA with a Bonferroni test in the three groups.

      In some cases, further rescue experiments would be appreciated. For example, expression of Ndi restores control NAD+ levels or number of NBs, it would be interesting to know if this is accompanied by restoring mitochondrial integrity and its ability to produce ATP.

      We have determined ATP production after overexpressing Ndi1 and provided this result in Figure 4—figure supplement 1B. The data showed that expression of Ndi1 could restore ATP production upon glial Fer2LCH knockdown, which was consistent with our conclusion.

      Lines 293-299 on page 7 are difficult to understand.

      According to our above results, the decrease of NB number and proliferation upon glial ferritin knockdown (KD) was caused by energy deficiency. As shown in the schematic diagram (Author response image 1), “T” represented the total energy which was used for NB maintenance and proliferation. “N” indicated the energy for maintaining NB number. “P” indicated the energy for NB proliferation. “T” is equal to “N” plus “P”. When ferritin was knocked down in glia, “T”, “N” and “P” declined in “Ferritin KD” compared to “wildtype (WT)”. Knockdown of pros can prevent the differentiation of NB, but it cannot supply the energy for NB, which probably results in the rescue of NB number but not proliferation. Specifically, NB number increased significantly in “Ferritin KD Pros KD” compared to “Ferritin KD”, which resulted in consuming more energy for NB maintenance in “Ferritin KD Pros KD”. As shown in the schematic diagram, “T” was not changed between “Ferritin KD Pros KD” and “Ferritin KD”, whereas ”N” was increased in “Ferritin KD Pros KD” compared to “Ferritin KD”. Thus, “P” was decreased, which suggested that less energy was remained for proliferation, leading to the failure of rescue in NB proliferation. It seemed that the level of proliferation in “Ferritin KD Pros KD” was even lower than “Ferritin KD”.

      Author response image 1.

      The schematic diagram of relationship between energy and NB function in different groups. “T” represents total energy for NB maintenance and proliferation. “N” represents the energy for NB maintenance. “P” represents the energy for NB proliferation. T=N+P 

      Line 601 should indicate that Tables 2 and 3 are part of the supplementary material.

      We have revised this in line 678.

      Figure 4-supplement 1. Only validation of 2 genes from a RNAseq seems too little.

      We dissected hundreds of brains for sorting NBs because of low biomass of fly brain. This is a difficult and energy-consuming work. Most NBs were used for RNA-seq, so we can only use a small amount of sample left for validation which is not enough for more genes.

      Figure 6E, the authors indicate that 10 mg/ml DFP injection could significantly prolong the survival time. Which increase in % is produced by DFP?

      We have provided the bar graph in Author response image 2. The increase is about 16.67% by DFP injection.

      Author response image 2.

      The bar graph of survival time of mice treated with DFP.

      (The unpaired two-sided Student’s t test was employed to assess statistical significance. Statistical results were presented as means ± SD. n=7,6; *: p<0.05)

      Reviewer #3 (Recommendations For The Authors):

      As I read the initial results that built the story (glia make ferritin>release it> NBs take them up>use it for TCA and ETC) I kept thinking about what it meant for NBs to be 'lost'. This led me to consider alternate possibilities that the results might point to, other than the ones the authors were suggesting. It was only in Figure 5 that the authors ruled out some of those possibilities. I would suggest that they first illustrate how NBs are lost upon glial ferritin loss of function before they delve into the mechanism. This would also be a place to similarly address that glial numbers and general morphology are unchanged upon ferritin loss.

      This recommendation provides a valuable guideline to build this story especially for researchers who are interested in neural stem cell studies. Actually, we tried this logic to present our study but found that there are several gaps in the middle of the manuscript, such as the relationship between glial ferritin and Pros localization in NB, so that the whole story cannot be fluently presented. Therefore, we decided to present this study in the current way.

      More details of the screen would be useful to know. How many lines did they screen, what was the assay? This is not mentioned anywhere in the text.

      We have added this in Screen of Materials and methods. We screened about 200 lines which are components of classical signaling pathways, highly expressed genes in glial cells or secretory protein encoding genes. UAS-RNAi lines were crossed with repo-Gal4, and then third-instar larvae of F1 were dissected. We got the brains from F1 larvae and performed immunostaining with Dpn and PH3. Finally, we observed the brain in Confocal Microscope.

      Many graphs seem to be repeated in the main figures and the supplementary data. This is unnecessary, or at least should be mentioned.

      We appreciate your kind reminder. However, we carefully went through all the figures and did not find the repeated graphs, though some of them look similar.

      The authors mention that they tested which glial subtypes ferritin is needed in, but don't show the data. Could they please show the data? Same with the other iron transport/storage/regulation. Also, in both this and later sections, the authors could mention which Gal4 was used to label what cell types. The assumption is that the reader will know this information.

      We have added the result of ferritin knockdown in glial subpopulations in Figure 1—figure supplement 2. However, considering that the quantity of iron-related genes, we did not take the picture, but we recorded this in Table 3.

      For all their images showing colocalisation, magnified, single-colour images shown in grayscale will be useful. For example, without the magnification, it is not possible to see the NB expression of the protein trap line in Figure 2B. A magnified crop of a few NBs (not a single one like in 2C) would be more useful.

      We have provided Figure 2A’, B’, D’ and Figure 3D’ as suggested.

      There are a lot of very specific assays used to detect ROS, NAD, aconitase activity, among others. It would be nice to have a brief but clear description of how they work in the main text. I found myself having to refer to other sources to understand them. (I believe SoNAR should be attributed to Zhao et al 206 and not Bonnay et al 2020.)

      We have added a brief description about ROS, aconitase activity, NAD in line 198-199, 229-231, and 269 as suggested.

      I did not understand the normalisation done with respect to SoNAR. Is this standard practice? Is the assumption that 'overall protein levels will be higher in slowly proliferating NBs' reasonable? This is why they state the need to normalise.

      The SoNAR normalization is not a standard practice. However, we think that our normalization of SoNar is reasonable. According to our results, the expression level of Dpn and Mira seemed higher in glial ferritin knockdown, so we speculated that some proteins accumulated in slowly proliferating NBs. Thus, we used Insc-GAL4 to drive DsRed for indicating the expression level of Insc and found that DsRed rose after glial ferritin knockdown, suggesting that Insc expression was increased indeed. Therefore, we have to normalize SoNar driven by Insc-GAL4 based on DsRed driven by Insc-Gal4, which eliminates the effect of increased Insc upon glial ferritin knockdown.

      FAC is mentioned as a chelator? But the authors seem to use it oppositely. Is there an error?

      FAC is a type of iron salt, which is used to supply iron. We have also indicated that in line 156 according to your advice. 

      The lack of any cell death in the L3 brain surprised me. There should be plenty of hemilineages that die, as do many NBs, particularly in the abdominal segments. Is the stain working? Related to this, P35 is not the best method for rescuing cell death. H99 might be a better way to go.

      We were also surprised to see this result and repeated this experiment for several times with both negative and positive controls. Moreover, we also used TUNEL to validate this result, which led to the same result. We will try to use H99 to rescue NB loss in the future, because it needs to be integrated and recombined with our current genetic tools.

      It would be nice to see the aconitase activity signal as opposed to just the quantification.

      This method can only determine the absorbance for indicating aconitase activity, so our result is just the quantification.

      Glia are born after NBs are specified. In fact, they arise from NBs (and glioblasts). So, it's unlikely that the knockdown of ferritin in glia can at all affect initial NB specification.

      We completely agree with this statement.

      The section on tumor suppression seems out of place. The fly data on which the authors base this as an angle to chase is weak. Dividing cells will be impaired if they have inadequate energy production. As a therapeutic, this will affect every cell in the body. I'm not sure that cancer therapeutics is pursuing such broadly acting lines of therapies anymore.

      Our data suggested that iron/ferritin is more critical for high proliferative cells. Tumor cells have a high expression of TfR (Transferrin Receptor)[11], which can bind to Transferrin and ferritin[12]. And ferritin specifically targets on the tumor cells[11]. Thus, we think iron/ferritin is extremely essential for tumor cells. If we can find the appropriate dose of iron/ferritin inhibitor, suppressing tumor growth but maintaining normal cell growth, iron/ferritin might be an effective target of tumor treatment.

      The feedback from NB to glial ferritin is also weak data. The increased cell numbers (of unknown identity) could well be contributing to the increase in ferritin. I would omit the last two sections from the MS.

      In brat RNAi and numb RNAi, increased cells are NB-like cells, which cannot undergo further differentiation and are not expected to produce ferritin. More importantly, we used Repo (glia marker) as the reference and quantified the ratio of ferritin level to Repo level, which can exclude the possibility that increased glial cells lead to the increase in ferritin.

      References

      (1) Tanimura T, Isono K, Takamura T, et al. Genetic Dimorphism in the Taste Sensitivity to Trehalose in Drosophila-Melanogaster. J Comp Physiol, 1982,147(4):433-7

      (2) Myster DL, Duronio RJ. Cell cycle: To differentiate or not to differentiate? Current Biology, 2000,10(8):R302-R4

      (3) Dalton S. Linking the Cell Cycle to Cell Fate Decisions. Trends in Cell Biology, 2015,25(10):592-600

      (4) Nichol H, Law JH, Winzerling JJ. Iron metabolism in insects. Annu Rev Entomol, 2002,47:535-59

      (5) Pham DQ, Winzerling JJ. Insect ferritins: Typical or atypical? Biochim Biophys Acta, 2010,1800(8):824-33

      (6) Speder P, Brand AH. Systemic and local cues drive neural stem cell niche remodelling during neurogenesis in Drosophila. Elife, 2018,7

      (7) Mumbauer S, Pascual J, Kolotuev I, et al. Ferritin heavy chain protects the developing wing from reactive oxygen species and ferroptosis. PLoS Genet, 2019,15(9):e1008396

      (8) Xiao G, Wan Z, Fan Q, et al. The metal transporter ZIP13 supplies iron into the secretory pathway in Drosophila melanogaster. Elife, 2014,3:e03191

      (9) Marelja Z, Leimkühler S, Missirlis F. Iron Sulfur and Molybdenum Cofactor Enzymes Regulate the  Life Cycle by Controlling Cell Metabolism. Front Physiol, 2018,9

      (10) Morgan LL. The epidemiology of glioma in adults: a "state of the science" review. Neuro-Oncology, 2015,17(4):623-4

      (11) Fan K, Cao C, Pan Y, et al. Magnetoferritin nanoparticles for targeting and visualizing tumour tissues. Nat Nanotechnol, 2012,7(7):459-64

      (12) Li L, Fang CJ, Ryan JC, et al. Binding and uptake of H-ferritin are mediated by human transferrin receptor-1. Proc Natl Acad Sci U S A, 2010,107(8):3505-10

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This paper described the dynamics of the nuclear substructure called PML Nucleolar Association

      (PNA) in response to DNA damage on ribosomal DNA (rDNA) repeats. The authors showed that the PNA with rDNA repeats is induced by the inhibition of topoisomerases and RNA polymerase I and that the PNA formation is modulated by RAD51, thus homologous recombination. Artificially induced DNA double-strand breaks (DSBs) in rDNA repeats stimulate the formation of PNA with DSB markers. This DSB-triggered PNA formation is regulated by DSB repair pathways. 

      Strengths: 

      This paper illustrates a unique DNA damage-induced sub-nuclear structure containing the PML body, which is specifically associated with the nucleolus. Moreover, the dynamics of this PML Nucleolar Association (PNA) require topoisomerases and RNA polymerase I and are modulated by RAD51mediated homologous recombination and non-homologous end-joining. This study provides a unique regulation of DSB repair at rDNA repeats associated with the unique-membrane-less subnuclear structure. 

      Weaknesses: 

      Although the PNA formation on rDNA repeat is nicely shown by cytological analysis, the biological significance of PNA in DSB repair is not fully addressed.

      We appreciate the succinct summary, and thank you for pointing out this insightful comment. Our data show that the dynamic interaction of PML with nucleolar caps can recognize and sequester damaged rDNA from the reactivated nucleolus. We propose that through this process, the actively transcribed intact rDNA is protected from possible detrimental interaction with the defective, PNAs-sequestered rDNA, most likely to avoid the harmful intra- and inter-chromosomal recombination events that would otherwise likely occur during recombinational repair of the damaged rDNA, as the rDNA repeats present on five chromosomes are highly repetitive. Thus, this novel sorting mechanism might help sustain the integrity of repetitive rDNA loci.

      Our data also indicate that the emergence of PNAs coincided with cell cycle arrest and preceded the establishment of cellular senescence. The senescent response to rDNA damage can primarily protect the genome from the instability of rDNA loci in a manner broadly analogous to that described for protecting the telomeric loci. This notion is supported by the lack of PNA formation in most cancer cells. In the broader context of the biological significance of cellular senescence at the organismal level, such robust response to hazardous rDNA damage in the individual affected cells may limit/prevent the sporadic occurrence of early cancerous lesions, at the expense of potential tissue adverse effects accumulating over time and thereby eventually contributing to organismal aging.

      Reviewer #2 (Public Review): 

      In this manuscript, the authors aim to study the PML-nucleoli association (PNAs) by different genotoxic stress and to determine the underlying molecular mechanisms. 

      First, from a diverse set of genotoxic stress conditions (topoisomerases, RNA Pol I, rRNA processing, and DNA replication stress), the authors have found that the inhibition of topoisomerases and RNA Polymerase I has the highest PNA formation associated with p53 stabilization, gamma-H2AX, and PAF49 segregation. It was further demonstrated that Rad51-mediated HR pathway but not NHEJ pathway is associated with the PNA formation. Immuno-FISH assays show that doxorubicin induces DSBs (53BP1 foci) in rDNA and PNA interactions with rDNA/DJ regions. Furthermore, endonuclease IPpol induced DSB at a defined location in rDNA and led to PNAs. 

      Most claims by the authors are supported by the data provided. However, below weaknesses/concerns may need to be addressed to improve the quality of the study. 

      (1) Top2B toxin doxorubicin had the highest degree of elevating PNAs; however, Top2B-knockdown had almost no noticeable effects on PNAs. How to reconcile the different phenotypes targeting Top2B? 

      We thank the reviewer for this comment and believe we can reconcile the results from doxorubicin treatments and the downregulation of TOP2A and B. 

      The different phenotypes can reflect the fact that doxorubicin targets both human TOP2 isoforms: TOP2A and TOP2B. Hence this treatment can limit any potential redundant roles of the individual topoisomerase subtypes, which, on the other hand, can be manifested under conditions when only one specific member is depleted genetically. On the other hand, it is also crucial to note that these isoforms are not fully functionally redundant. Each isoform reveals a characteristic expression pattern and distinct yet overlapping function (e.g. Nitiss J 2009, doi.org/10.1038/nrc2608, or Uusküla-Reimand 10.1126/sciadv.add4920). Thus, doxorubicin treatment or TOP2A KD can, contrary to TOP2B KD, trigger the formation of PNAs.   

      Additionally, besides topoisomerase inhibition and poisoning, doxorubicin intercalates DNA and elevates oxidative stress. Therefore, the observed effect of doxorubicin may also reflect, to some extent, its broader damaging impact on (r)DNA. On the other hand, the downregulation of individual topoisomerase isoforms shows how the restriction of their respective specific function/s may evoke (r)DNA damage.

      (2) To test the role of Rad51 and DNA-PKcs in the PNA formation, Rad51 inhibitor B02 and DNA-PKcs inhibitor NU-7441 were chosen to use in the study. To further exclude the possible off-target of B02 and NU-7441, siRNA-mediated knockdown of Rad51 and DNA-PKcs would be an appropriate complementary approach to the pharmaceutical inhibitor approach. 

      We followed this stimulating suggestion, and in the revised manuscript, we used pools of siRNAs (esiRNA) to target the mRNA of RAD51 or ligase IV (LIG4) -  to mimic the Rad51 chemical inhibitor B02 and the NHEJ (DNA PK) inhibitor NU-7441, respectively. The relevant new data are presented in Figure 5F-I, 6E, and F, Supplementary Figure 5D, E, F – H, and Supplementary Figure 6C-E. Notably, the results of rDNA damage triggered PNAs formation obtained using the chemical inhibition of the repair pathways and the genetic approach (knockdown), were largely consistent, thereby supporting our original conclusions. There was one interesting partial difference when the B02 RAD51 inhibitor was compared with RAD51 knockdown, which we also comment on below, and suggest a plausible explanation reflecting the fact (known for other DDR proteins such as PARP1, etc.) that the functional inhibition of an expressed protein (here RAD51, by B02) may not necessarily phenotypically recapitulate the absence of such protein (here RAD51 knockdown). Overall, we agree that this was a very important set of control experiments, in addition extended to cell cycle phase analysis.

      First, the LIG4 knockdown impacted the I-PpoI-induced PNAs formation in a way that followed the same trend as the effects caused by the NHEJ pathway inhibitor NU-7441, namely increased frequency of PNAs formation when NHEJ was impaired (Figure 5E a 5I). This was expected based on what we know about the PNA formation, as the NHEJ pathway is active throughout the cell cycle, and when such repair mode is not available in the nucleolus, then more rDNA breaks remain unrepaired and must be transported to the nucleolar caps to be processed by the HR pathway, thereby also leading to more PNAs structures formed under such conditions. In terms of cell cycle phases, the observed increase of I-PpoI-induced PNAs in cells with depleted LIG4 was more pronounced in S/G2 cells, when the PNAspromoting, cap-associated HR pathway is more active. Furthermore, the enhanced occurrence of IPpoI-induced PNAs in cells depleted of LIG4 was counter-acted (partly ‘rescued/prevented’) by the concomitant treatment with the RAD51 inhibitor B02 (Figure 5E and I) compare cells with esiLIG4 alone versus esiLIG4 + B02), overall consistent with the notion that cap-associated HR pathway facilitates PNAs formation.

      Second, in the analogous scenario of comparing the impact of the RAD51 chemical inhibitor (B02) with the siRNA-mediated knockdown of RAD51, the observed trends in terms of the resulting frequencies of I-PpoI-induced PNAs, were also largely consistent, in that both strategies of interfering with RAD51 resulted in fewer PNAs formed than than in cells deficient in NHEJ. On the other hand, we must stress that after RAD51 knockdown, we did not observe a decline of PNAs compared to control cells, which was detected after B02 treatment (Figure 5E and I).  However, when specifically considering the cell cycle position of the individual cells, these new analyses revealed again important similarities between the knockdown and chemical inhibition of RAD51 (Figure 6E, Supplementary Figure 6E).

      Before discussing the partial, cell-cycle-related difference between the impact of RAD51 chemical inhibition vs. knockdown, it is important to consider the PNAs patterns seen in cells with activated IPpoI and proficient in both, NHEJ and HR. Thus, the overall frequency of I-PpoI-induced PNAs formation was higher in G1 than in S/G2 cells. Considering that persistent rDNA DSBs trigger the formation of PNAs, this result may reflect the very limited HDR during G1 phase, in contrast to more efficient repair of I-PpoI-induced rDNA DSBs in S/G2, the cell cycle phase in which the activity of both NHEJ and HDR operate in parallel, the latter pathway offering a safer, error-free mechanism of DSB repair.

      Notably, when comparing the PNAs formation frequency in cells treated with either chemical inhibition of RAD51 (with B02) or upon knockdown of RAD51, we strikingly observed that the decrease of I-PpoIinduced PNAs formation upon RAD51 knockdown was apparent only for cells in G1 (Figure 6E, and Supplementary Figure 6E). We believe that the distinct impact of RAD51 knockdown compared with that of RAD51 inhibitor (mainly seen when S/G2 cells were analyzed separately) might reflect one or a combination of several factors, including e.g. the following: 

      i) The knock-down-induced absence of RAD51 protein may allow access to the persistent DSB lesions by other alternative repair proteins (such as the RAD52-mediated repair reported in diverse pathophysiological circumstances including in cells undergoing senescence, a scenario very relevant for our present study). Such altered stoichiometry of proteins interacting with the persistent rDNA DSBs may contribute to the pattern of PNAs formation that is then distinct from the pattern seen in the presence of  Rad51; 

      ii) Another difference that we observe is the somewhat enhanced frequency of ‘spontaneous’ (i.e., even without activating the I-PpoI) PNAs formation when RAD51 is depleted, a phenomenon not seen when control non-targeting siRNA is transfected or when RAD51 is acutely inhibited by B02 (Figure 5H). Such spontaneous baseline PNA formation likely reflects the enhanced persistence of unrepaired endogenously occurring DNA lesions that are already suboptimally processed during the period following the esiRNA transfection, i.e., under stepwise depletion of the RAD51 protein which is normally required to deal with such omnipresent endogenous lesions occurring during e.g. DNA replication or some oxidative/metabolic processes; 

      iii) The knockdown approach, while clearly robustly depleting RAD51 protein levels (see Supplementary Figure 5D) may nevertheless leave a small residual fraction of the RAD51 protein present in the cells, thereby possibly inhibiting the HDR pathway to a slightly lesser degree than the B02 inhibitor;

      iv) Additionally, it should be noted that the baseline levels of I-PpoI-induced PNAs formation are somewhat higher in the transfection experiments (i.e. when using any siRNA, even the nontargeting control siRNA), compared with the less ‘invasive’ experiments of simply adding a drug/solvent to the cell culture medium. This phenomenon adds to the commonly seen (over decades, by us and many others..) above-baseline transient stress in cells exposed to transfections, often causing even moderate transient DNA damage response. Specifically, in control experiments, the level of I-PpoI-induced PNAs was around 15% in cells transfected with non-targeting siRNA, while the comparable experiment of only I-PpoI induction under non-transfection conditions was around 10%. In other words, the somewhat enhanced baseline counts of I-PpoI-induced PNAs seen in the knock-down experiments compared with chemical inhibitor experiments reflect partly the shift of the total readout counts due to the different baseline counts. This, however, does not alter the observed overall trends that are consistent in both types of experiments.

      While the potential interpretation(s) of the above results are presented in the Discussion section of the revised manuscript, the full mechanistic elucidation of the impact of various experimental manipulations on the PNA formation during the cell cycle would require a dedicated follow-up study.

      (3) Several previous studies have shown the activation of the nucleolar ATM-mediated DNA damage response pathway by I-Ppol-induced DSBs in rDNA. What is the role of nucleolar ATM in the regulation of PNAs?

      We agree this is an important issue the solution of which (explained below) strengthens the mechanistic insights provided in our revised manuscript, and we are grateful to the reviewer for raising this question. To address this important point and even extend the scope from ATM also to ATR, we employed two small-molecule inhibitors of ATM (KU-60019 and KU55933) and also one inhibitor of ATR (VE-822), at concentrations commonly used in analogous studies in the DNA damage response field,  to examine their impact on rDNA damage/PNA formation induced by I-PpoI. The new data are shown in Figures 5A and B. We found that the inhibition of either of the two kinases alone, robustly reduced the number of nuclei with PNAs, indicating that the activity of each of these two DNA damage signaling kinases is required for the formation of I-PpoI-induced PNAs in response to rDNA damage. Future experiments should elucidate precisely which of the very wide range of ATM/ATR substrates and/or specific protein domains and amino acid residues are instrumental in this rDNA damage signaling pathway to induce the formation of PNAs.

      Reviewer #3 (Public Review): 

      Summary: 

      Hornofova et al. examined interactions between the nucleolus and promyelocytic leukemia nuclear bodies (PML-NBs) termed PML-nucleolar associations (PNAs). PNAs are found in a minor subset of cells, exist within distinct morphological subcategories, and are induced by cellular stressors including genotoxic damage. A systematic pharmacological investigation identified that compounds that inhibit RNA Polymerase 1 (RNAPI) and/or topoisomerase 1 or 2A caused the greatest proportion of cells with PNA. A specific RAD51 inhibitor (R02) impacted the number of cells exhibiting PNAs and PNA morphology. Genetic double-strand break (DSB) induction within the rDNA locus also induced PNA structures that were more prevalent when non-homologous end joining (NHEJ) was inhibited. 

      Strengths: 

      PNA are morphologically distinct and readily visualized. The imaging data are high quality, and rDNA is amenable to studying nuclear dynamics. Specific induction of rDNA damage is a strong addition to the non-specific pharmacological damage characterized early in the manuscript. These data nicely demonstrate that rDNA double-strand breaks undermine PNA formation. Figure 1 is a comprehensive examination and presents a compelling argument that RNAPI and/or TOP1, TOP2A inhibition promote PNA structures. 

      Weaknesses: 

      (1) The data are limited to fixed fluorescent microscopy of structures present in a minority of cells. Data are occasionally qualitative and/or based upon interpretation of dynamic events extrapolated from fixed imaging. This study would benefit from live imaging that captures PNA dynamics. 

      We fully agree with the reviewer that live-cell imaging is critical to adequately capture PNA formation and evolution dynamics. While the data presented in this manuscript are based on quantifications of fixed cell images, all these analyses are based on a detailed live-cell imaging examination of the dynamic behavior of PNAs that we reported in our orginal study on PNAs formation as a biological phenomenon (Imrichova et al. (doi: 10.18632/aging.102248. Epub 2019 Sep 7). 

      In the revised version of our present manuscript, we better highlight the live-cell imaging study, in the Introduction section and further point out that the previous dynamic study was based on imaging of human cells ectopically expressing PML-EGFP and B23-RFP. Last but not least, to help the readers of this manuscript to understand the dynamics of PNA evolution, we have now also added an improved schematic figure that better illustrates the temporal dynamics of PNA stage transitions (Figure 1A).

      (2) Cell cycle and cell division are not considered. Double-strand break repair is cell cycle dependent, and most experiments occur over days of treatment and recovery. It is unclear if the cultures are proliferating, or which cell cycle phase the cells are in at the time of analysis. It is also unclear if PNAs are repeatedly dissociating and reforming each cell division. 

      We agree that this is an important point. We previously published (Imrichova et al., doi: 10.18632/aging.102248) that exposure of RPE-1hTERT cells to doxorubicin caused cell cycle arrest and cellular senescence. In the revised manuscript, we added the analysis of how the I-PpoI-induced rDNA DSB affects the cell’s fate (Supplementary Figure 4J-N). Importantly, we found that most of the cells after I-PpoI-induced rDNA DSB also developed cellular senescence, and only 1–3% of cells eventually recovered from such rDNA stress to the extent that they were able to form colonies in a colony-forming assay. Thus, at the time of analysis, most of the cells were non-proliferating. 

      Additionally, in the revised manuscript, we included an analysis of the dependence of PNA formation on specific cell cycle phases (see Figures 6E–I and Supplementary Figure 6C–E). Generally, we found that PNAs can be present in G1/S/G2. Nevertheless, the probability of occurrence in a particular cell cycle phase is affected by the type of treatment. For example, after I-PpoI-induced rDNA damage, the PNAs are primarily present in G1. In contrast, after the sole knockdown of RAD51 or TOP2A, the PNAs are present in S/G2 with higher probability. 

      (3) The relationship of PNA morphologies (bowl, funnel, balloon, and PML-NDS) also remains unclear. It is possible that PNAs mature/progress through the distinct morphologies, and that morphological presentation is a readout of repair or damage in the rDNA locus. However, this is not formally addressed.  

      The reviewer is indeed correct in his/her interpretation of the PNA morphologies as a readout of the dynamic fate of the rDNA lesion. As mentioned in our response to the previous point no. 2 raised by this reviewer (see above), we described the dynamic structural PNA transitions in our previous article (Imrichova et al., doi: 10.18632/aging.102248).

      PNA progresses through distinct structures. Our results indicate that individual PNA subtypes are tied to specific processes. The PNA bowl-type is linked to the recognition of rDNA damage on the nucleolar periphery. The PNA funnel-type clusters several damaged rDNA loci from the nucleolus into PML-NDS, which is the ultimate structure that sequesters unrepaired rDNA away from the reactivated nucleolus.

      The formation of bowls, funnels, and balloons is linked to the inhibition of RNA polymerase I during the formation of nucleolar caps. In contrast, the later stage of PML-NDS is linked to RNA polymerase I reactivation. 

      We should mention that after the I-PpoI treatment, the ‘bowls’ and ‘funnels’ (observed originally in response to topoisomerase inhibitory drugs) are missing, and only PML-NDSs are formed. The apparent absence of the preceding stages of PNAs may reflect the lower extent of rDNA damage induced by I-PpoI treatment, without causing the pan-nucleolar RNA polymerase I inhibition that was observed for other treatments, such as doxorubicin.  

      (4) An I-Ppol targeted sequence within the rDNA locus suggests 3D structural rearrangement following damage. An orthogonal approach measuring rDNA 3D architecture would benefit comprehension.

      This is a very inspiring idea. Given the demanding nature of the required 3D analyses and the fact that this aspect is somewhat outside the scope of the present study, we plan to follow this issue up in our future work, along with our efforts to localize the individual NORs using immune-FISH after introducing the rDNA damage by I-PpoI.

      (5) Following I-Ppol induction, it is possible that cells arrest in a G1 state. This may explain why targeting NHEJ has a greater impact on the number of 53BP1 foci and should be investigated.

      We fully agree with the Reviewer. Indeed, our results showed that after a 24-hour period of I-PpoI induction, most cells (about 90%) are in the G1 phase of the cell cycle, consistent with the activation of the ATM/ATR checkpoint signaling and p53 activation that we observed. Therefore, this cell cycle effect can indeed explain why targeting NHEJ has a greater impact and causes the higher numbers of 53BP1 foci (and also yH2AX foci). 

      (6) Conclusions: PNAs are a phenomenon of biological significance and understanding that significance is of value. More work is required to advance knowledge in this area. The authors may wish to examine the literature on APBs (Alt-associated PML-NBs), which are similar structures where telomeres associate with PML-NBs in a specific subset of cancers. It is possible that APBs and PNAs share similar biology, and prior efforts on APBs may help guide future PNA studies.  

      We are very grateful for this stimulating suggestion. In the Discussion of the revised manuscript, we now address the possible analogy between the APBs under ALT on the one hand, and the PNA formation on rDNA damage studied here, on the other. The following is the quote of the relevant paragraph of the revised Discussion: 

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.”

      Our responses to recommendations from the Editors:

      (1) Since this paper does not provide a mechanistic insight into how the different PNA forms after DNA damage and PolI inhibition such as doxorubicin (DOXO) treatment and how HR modulates the PNA formation, it is very important to provide some experimental data for those. For example, as the #3 reviewer suggested, the time-lapse analysis of PML and a rDNA marker after DOXO treatment and recovery would be beneficial. with morphological analysis. 

      We fully agree that live-cell imaging is essential for a better understanding of the evolution and function of PNAs'. The requested time-lapse analysis on the dynamics of the PNA morphological stages after DOXO treatment and recovery is available to the Reviewers and readers in our previously published article that reported the PNA phenomenon and the basic live cell imaging data after doxorubicin treatment using the ectopically expressed PML-GFP and B23-RFP (Imrichova et al.; doi: 10.18632/aging.102248.). In our present revised manuscript, we now refer to this work in the Introduction and further stress that those data were based on live-cell imaging, to better highlight this point along the line recommended by the Reviewers. We have now also added an improved scheme that better explains the temporal dynamics of PNA transitions (Figure 1A).

      (2) In the same line as point #1, it is very important to show what kind of signaling pathway is necessary for PNA formation upon DSB formation with PolI inhibition. For example, as the #2 reviewer advised, the role of ATM or ATR could be tested by adding their inhibitor during the PNA formation. 

      Again, we fully agree that clarification of the signaling pathway required for PNA formation is crucial, and we are grateful for this stimulating recommendation. While the mentioned Reviewer no. 2 (in his/her Public comments) asked only about the role of ATM, the Editors rightly requested that we should use distinct inhibitors to test the respective roles of not only ATM but also ATR. As recommended, we have tested the importance of ATM and ATR kinase activities by inhibiting them during PNA formation. These newly generated data clearly showed that the activity of either kinase is essential for the efficient formation of PNA, thereby providing a significant new mechanistic insight in the revised dataset. In the manuscript, these new results are now shown in Figures. 5A and B. We also addressed this issue in the Public Review (Reviewer #2 point 3).

      (3) Given the association of PML body with telomeres in ALT cells (ALT-associated PML Body, APB) has been established well in the field, the authors need to mention this in the Introduction and also compare how PNA is similar to different from APB clearly in the Discussion.

      We have followed this conceptually important recommendation exactly as suggested: i) We now mention the ALT-associated PML Body (APB)  in the Introduction section (end of the second paragraph) and ii) In much more detail, we now compare the conceptual analogy in terms of similarities and differences between PNA and APB in the revised Discussion.  We also address this issue in the document Response to Public Review (Reviewer #3 point 6). Indeed, we agree that this comparison is very fitting in the context of our dataset and informative for the broad audience.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Major points. 

      (1) Any treatments shown in Figure 1B and 1C did not induce PNA in most of the cells with around 20% for a maximum value. What time point(s) the authors checked should be stated in the main text or the legend clearly. The authors need to mention the kinetics of different PNA classes and/or doseresponse effects at least for doxorubicin and BMH-21. Or a cell-cycle stage effect should be analyzed and/or discussed given that HR is mainly operating in S and G2 phases. 

      Thank you for pointing this out. We have now clarified the dose effects and also both analyzed and discussed the PNA formation vis a vis cell cycle stages, as recommended by this insightful reviewer.

      First, we have now added an experimental scheme to the Figures for better clarity regarding the time points examined, as suggested.

      Second, our results show that drug doses indeed affect the number and subtype of PNAs that form after such treatments. We show PNAs (types and number) after 0.5 – 5 – 50 µM camptothecin, topotecan, and etoposide (Supplementary Figure 1G and H) and after 0.375 – 0.56 – 0.75 µM doxorubicin (Figure 2A-D and Supplementary Figure 2E-G).  

      The very first detailed analysis of PNA evolution was presented in Imrichova et al. (doi: 10.18632/aging.102248.), where we described, using live-cell imaging, the relationship between the individual doxorubicin-induced PNA types, their transitions, and dynamics. We found that the highest number of nuclei with PNAs was present between 24 and 48 h after treatment initiation. Thus, we selected this time point for PNAs detection after treatments presented in Figure 1B.  

      We have now also added the distribution of nuclei based on the presence of specific PNA types into Supplementary Figure 1F.

      We included the analysis of the dependence of PNA formation on specific cell cycle phases (see Figures 6E–I). A very detailed explanation of the observed cell cycle effects is presented in the document Responses to Public Review, re. Reviewer nr. 2, point 2, so please kindly read our response there.

      (2) Although the induction of PNA by DSBs at rDNA repeats is clearly shown in the paper and modulated by DSB repair pathways, the biological significance of this sub-nuclear structure has not been addressed at all. Is the PNA required for efficient DSB repair per se or pathway choice? Moreover, the PNA kinetic is peculiar. Once formed, the PNA did not show any turnover even after the DNA-damaging agents were washed away (Figure 4H). This structure is succeeded into the next generation after cell division. Such dynamics of PNL should be carefully addressed. 

      The reviewer is correct in that the fate of the PNA and the potential biological significance of this phenomenon required a better explanation. The majority (≈97%) of cells after I-PpoI induction undergo cellular senescence, and therefore, we suppose that the PNA structures are not passed into the next cell cycle, as the bulk of the cells do not proliferate/cycle after such treatments. In this regard, it should be noted that PNAs (PML-NDS) are associated with replicative senescence of human mesenchymal stem cells (our old publication: Janderova-Rossmeislova 2007; doi: 10.1016/j.jsb.2007.02.008). To answer the comment of this reviewer, we have actually never observed that the cells with PNA present would be able to enter mitosis. Based on these findings, we suggest that damage to the repetitive rDNA loci, such as in our experiments in the form of DSBs, could commonly result in unsuccessful repair attempts leading to cellular senescence due to rDNA damage signaling, consistent with our new experiments highlighting the key role of the signaling mediated by the major DNA damage response kinases ATM and ATR, including the role of PNAs formation. For more details, please see also our response to Point 2 raised by the editors, on page 1 of this document, as well as our Public review response to Referee nr. 2, his/her points 2 and 3.

      From a broader perspective, relevant to the biological function of PNAs in this unorthodox cellular stress response, we showed that doxorubicin-induced PML-NDSs separate/sequester persistent rDNA DSBs from the regions of active pre-rRNA transcription. Again, the purpose of this process is not entirely clear at present. However, such separation of unrepaired rDNA from the rest of the genome could have a protective function, thereby limiting the risk of aberrant homologous recombination among hundreds of the repetitive, recombination-prone rDNA copies spread across five chromosomes. It should be stressed that PNAs are rarely seen in cancer cells, and their absence might be linked to the rDNA instability commonly seen in transformed cells. 

      As published in our previous study (Imrichova et al.; doi: 10.18632/aging.102248.), we followed the fate of individual PML-NDS (the last stage of PNA) after the recovery from doxorubicin treatment using live-cell imaging. We observed that the destiny of this structure could be diverse. Some of them sustained in the nucleus for many hours, but a portion of them disappeared. Their extinction may be a manifestation of successful rDNA repair. However, what remains unresolved is why these cells do not reenter the cell cycle and instead develop a senescent phenotype, possibly reflecting some paracrine effects of a cocktail of diverse cytokines and chemokines secreted by the neighboring cells, a phenomenon well established in the senescence field as SASP (senescence-associated secretory phenotype). 

      Notably, during the recovery phase from I-PpoI insult, some of the PML-NDS, in fact, increase in size over time (please refer to the graph in Author response image 1). This enlargement suggests ongoing processes within these structures. Additionally, the sequential accumulation of DHX9 (a multifunctional DNA/RNA helicase) in PNAs during recovery from the I-PpoI insult (as shown in Figure 4G and Supplementary Figure 4H in the revised manuscript) supports the hypothesis that PNAs are associated with as-yet poorly understood process(es). 

      Author response image 1.

      . A scatter plot shows the changes in PNA diameters during the recovery phase from a 24-hour-long expression of IPpoI.

      Last but not least, again relevant for the potential biological role of PNAs, we now also discuss the partial analogy of these structures with the PML-association with telomeres in cells that maintain their telomeres by the ALT recombinational process, as suggested by Referee no. 3 in the public review process. As this consideration addresses also the biological significance of the diverse PML associations and particularly our thoughts about the PNA, we copy/paste this paragraph from the Discussion section of our revised manuscript here, for the convenience of the Reviewer:

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.”

      (3) The association of PNA with DSB repair is shown by the colocalization with 53BP1 (Figures 3-5) and the kinetics of DSB repair were assessed by 53BP1 kinetics (Figure 5B). The authors need to check the colocalization of other DSB repair factors in homologous recombination (RPA and RAD51) and nonhomologous end joining (KU) and the kinetics of these DSB repair foci. 

      We are grateful for this very relevant suggestion. In response to this recommendation, we have examined additional markers, linked to homologous recombination. In Figures 6A—D and Supplementary Figures 6A and B, we now show also the localization of RAD51 and RPA32 (pS33), along the lines recommended by this Reviewer.

      (4) In Figure 5B, 53BP1 foci in the "nucleolus" should be shown with that in the nucleus. 

      In the revised manuscript, we show histograms with a count of 53BP1 foci per nucleus.

      (5) The authors often used the words, "difficult-to-repair" and "easy-to-repair" DNA lesions. However, without the nature of these DNA lesions, it is early to distinguish the lesions. So, the authors should avoid them in the title, abstract, results, and figure legends. In Discussion, it is free to use them with a logical explanation. 

      Thank you for the recommendation. We have now changed the term “difficult-to-repair” to “persistent rDNA damage”, as this term better describes at face value the scenario encountered in these experiments. In the new version of the manuscript, we have now emphasized that PNAs are formed as a late response to rDNA damage. We added the observation that PNAs colocalized with rDNA lesions accumulated in the nucleolar cap (periphery of nucleolus), which are probably in-compatible with NHEJ-mediated repair that otherwise occurs within the nucleolus. These persistent lesions contained phospho-RPA, a marker of resected DNA. However, RAD51 was not detected in such late lesions, indicating that the canonical RAD51-dependent HDR pathway is also restricted. Finally, we included a section defining such persistent DNA damage in the revised Discussion.

      Minor points: 

      (1) Page 5, second paragraph, line 6: "expression of PML". 

      (2) Page 5, line 6 from the bottom and Figure 1B: Actinomycin D is not a "specific" RNA polymerase I inhibitor. 

      (3) Page 6, first paragraph, last line: "DNA DSB" should be "DSB". 

      (4) Page 6, second paragraph, lines 6-7: What is the evidence of RNA polymerase I is active (need to explain to the readers)? 

      (5)  Figure 1D and main text: Please mention DOXO is the abbreviation of doxorubicin. 

      We are grateful for these points, which have now all been corrected in the revised version of the manuscript.

      (6) Page 6, third paragraph, line 4 and Figure 1D: What is "esi" not "si"TOP1. 

      In the revised manuscript, we explained what ‘esiRNA’ means; in fact, it is the pool of biologically prepared siRNAs targeting the mRNA of the protein being knocked down.

      (7) Figures 2A and 2B: The effect of B02 alone on PNA should be shown as a control.

      As recommended, the effect of B02 alone is now presented in Supplementary Figures 2A and B. 

      (8) Page 7, first paragraph, last three lines: It is hard to catch how the authors suggested the inhibition of RAD51 suppressed  RNAPI activity. If so, please  check the incorporation of 5FU. 

      Thank you for pointing out this confusing formulation. We have now removed from the revised manuscript the part of that original sentence: “which are predominantly associated with RNAPI inhibition”. 

      We observed that PML ‘balloons’ wrapped the nucleolus with the concomitantly observed complete inhibition of RNAPI in the nucleolus (Imrichova et al.; doi: 10.18632/aging.102248.). Nevertheless, we removed the original phrase from the revised version of the manuscript, as we agree with the reviewer that the causative relationship is so far lacking.

      (9) Page 7, second paragraph: It is critical to clarify what time B02 was added after DOXO removal or during DOXO treatment, or both.  

      We agree: In response we have now added the experimental scheme showing all these temporal details.

      (10) Figure 2H: The experiment lacks control with siTDP2 without etoposide treatment. 

      We did not include this control, unfortunately.

      (11) Page 8, third paragraph, line 3 from the bottom; "besides of rDNA probe, we also utilized probes" is better. 

      We changed this sentence in the revised manuscript, as recommended. 

      (12) Figure 3B: In these multi-color images, it is hard to see blue and gray in merged ones. It is better to show images with a single color. 

      We agree that grayscale is better to follow. However, this type of presentation would significantly increase the number of images, a circumstance we wished to avoid in this already rather image-heavy dataset. Instead, when it was possible, we elevated the intensity of fluorescence in colored images. The list of images with this adjustment is present in the public review. 

      We also inserted the example of the image in greyscale here as Author response image 2. 

      Author response image 2.

      The representative images nucleoli show the localization of 53BP1 (red; a marker of DNA DSB), PML (green, a marker of PML-NB or PNAs), rDNA (blue), and DJ (white; a marker of the acrocentric chromosome) after doxorubicin treatment (2 days) or in the recovery phase (1 and 4 days). The merge of all channels is shown together with the presentation of individual images in greyscale. Scale, 5 µm.  

      (13) Figure 4E: Please add values at D0. 

      We did not analyze the 53BP1 foci before adding Shield1 and doxycycline to induce the expression of I-PpoI (D0). However, as a control, we analyzed the 53BP1 foci in the cells treated for 24 h with the corresponding amount of DMSO as a mock treatment scenario (black line; NT).

      Reviewer #2 (Recommendations For The Authors): 

      (1) The data provided in this manuscript did not explicitly compare the easy-to-repair vs difficult-torepair DNA lesions in rDNA, or at least lack quantitative measures with statistical analysis. Therefore, the title may need to be revised accordingly. 

      We agree, and the title has now been revised to better capture the persistent nature of the rDNA damage that evokes the PNA formation. Please see the response to Reviewer #1, Major points 5, presented above in this document.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Live imaging is paramount to understanding the dynamic nature of PNAs.  

      We agree that live-cell imaging is important. We have addressed this issue in detail in Response to Public review comments, of this Reviewer, as well as in the first point of this document in response to the Editors. In short, although the data presented in this manuscript are based on quantifications of fixed cell images, all these analyses benefit from our previous detailed live-cell imaging data that we reported – describing a careful examination of the dynamic behavior of PNAs in the study by Imrichova et al. (doi: 10.18632/aging.102248). To better illustrate the dynamic behavior of PNAs for the convenience of this reviewer, we include some data from our original article on this topic (referred to above): please see Author response image 3.

      Author response image 3.

      This Figure shows data published in Imrichova et al. (doi: 10.18632/aging.102248.). PML IV-EGFP was ectopically expressed in RPE-1hTERT cells. The localization of PML was followed using live cell imaging. (A) the bowl (in this work named cap) originates from the accumulation of diffuse PML. (B) The transition between bowl (named cap), funnel (named fork), and balloon (named circle). (C + D) PML IV-EGFP (green) and B23-RFP (red) were ectopically expressed in RPE-1hTERT cells. The localization of both proteins was followed by live cell imaging. C – The formation of PML-NDS from the funnel is shown; D – The entire PNA cycle is shown. (PML-bowl formed on the border of the nucleolus, then transformed into the PML-funnel, and finally into PML-NDS. 

      (2) The authors should consider cell cycle and cell proliferation in their analyses. 

      We are grateful for this recommendation, which echoes your own comment nr. 2 in the Public reviews document. Shortly, as we explained in the response to Public review, proliferation of PNA-containing cells is severely limited, as the vast majority of such cells enter a long-term arrest and cellular senescence. Furthermore, inspired by this comment, we have newly performed a series of experiments to address the frequencies of PNA formation vis a vis cell cycle phase position of the individual cells with rDNA damage. In the revised manuscript, we now include the data from these analyses: see Figures 6E–I and Supplementary Figures 6C–E. Our response in the Public Review provides a detailed description of these results.

      (3) Merged fluorescent micrographs in red and green are potentially not discernible to individuals with colour-vision deficiencies. Consider re-colouring into schemes that are more accessible. 

      We agree that some readers may have different preferences about fluorescence micrographs. Here, we used the classical combination of green and red, commonly employed in the field.

      (4) Single-colour fluorescent micrographs are easier to visualize in grey-scale. Whenever a single colour is shown, it will help reader comprehension if the images are shown in this manner. 

      As recommended, we have changed Figures 4C, F, and G from a single-color presentation to a greyscale. 

      (5) There are many long paragraphs that are difficult to digest. I suggest where possible breaking this text into smaller portions (e.g. Page 10, pages 13-14, page 16-17). 

      Thank you for pointing this out. We have now broken the text into smaller portions (in several places), as recommended.

      (6) The B02 and NU7441 data would be bolstered by genetic confirmation (depleting RAD51, BRCA2 or PALB2 for HR, DNA-PK or LIG4 for NHEJ).

      As recommended, we downregulated Rad51 and LIG4 by RNA interference. New data are presented in Figures 5F–I, 6E, and F, Supplementary Figures 5D, E, F–H, and Supplementary Figures 6C–E. The Public Review provides a detailed description of these results and the ensuing conclusions.

      (7) Microscopy results are often qualitative (Fig S1I, S2L, S3A) and need to be bolstered with quantitative data. 

      We appreciate this recommendation and have implemented quantifications in several important microscopy results, as follow:

      S1I: The quantification of the number of cells with types of PNAs after esiTOP1 is present in Supplementary Figure 1L

      S2L: The quantification (% of nuclei with PNAs) is in Figure 2H

      S3A: In this immuno-FISH figure, we captured nuclei with and w/o PNAs. Using the SQUASSH analysis, we identified size-based colocalization between rDNA–PML and DJ–PML presented in Supplementary Figure 3C.

      (8) Stats or error bars are missing (Fig 1D, 2H, S1C-E, S1F, S2A S2D-G, S3E, S4E).

      We apologize for those omissions and we have amended this aspect of the study in the revised manuscript as much as possible:

      Figure 1D: For AMD and doxorubicin and CX-5461 and doxorubicin treatments, three and two biological replicates are shown separately in the same graph, respectively. For AMD and the knockdown of TOP1, the mean from three biological replicates is shown. All these results indicate the elevation number of PNAs when RNAPI is inhibited.

      Figure 2H: The error bars are present. As for siTDP2 in all replicates, the number of cells was the same (4%). Therefore, the error bar is not visible.

      Supplementary Figure 1C-E: Unfortunately, only one replicate (for all treatments) was analyzed by western blotting.

      Supplementary Figure 1F (in revised manuscript SF1G): The error bars are present. By this graph, we mainly wanted to present the variation in PNAs types. 

      Supplementary Figure 2A (in revised manuscript SF2C): We include the whiskers 10-90 percentile and T-test.

      Supplementary Figure 2D-G (in revised manuscript SF2F-I): The error bars are present in all graphs. The changes in SF2F and G are not significant.

      Supplementary Figure 3E: This scheme shows the overlaps between rDNA and PML and rDNA and 53BP1. The collum graph based on these data is shown in Figure 3F.

      Supplementary Figure 4E: The plot profiles representing the mean fluorescence of PML and B23 are shown for different time points. 

      (9) PNA characteristics remind this reviewer of the well-described ALT-associated PML nuclear bodies (APBs) found in immortalized cells lacking telomerase (i.e. Alternative lengthening of telomeres). I recommend the authors look to published data on APBs to help guide how to approach their research within a framework of the cell cycle.

      We fully agree with this insightful comment, and have addressed this point in the Discussion section of the revised manuscript, quoted the relevant studies also in the Introduction, and indeed explained the parallels and also differences of PNA versus APB (see also our response to point 3 highlighted also by the Editors, early in this rebuttal document).  We have also addressed this issue in the Public Review (Reviewer #3 point 6). We agree with the reviewer that this comparison will be of wide interest to readers, given the potential insights into the biological roles of APBs and PNAs.

      For convenience, we copy/paste the relevant new paragraph of the Discussion here:

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.” 

      (10) Do PNAs mature/progress through the four distinct structures: bowl, to funnel, to balloon, and finally to PML-NDS. If true, this serves as a phenotypic read-out of damage induction (bowl) and repair (PML-NDs). It would suggest persistent unrepairable damage (0.56 or 0.75 uM doxorubicin) prevents repair leading to the formation of all the PNA structures except PML-NDs. While lower dose doxorubicin (0.375 uM) allows repair to occur, facilitating progression to the PML-ND state, which is then inhabited with B02. 

      Again, this is a very insightful comment. Indeed, as the Reviewer suggests and as we explained e.g., in our response to point 1 raised by this reviewer, PNA progresses through four distinct structures/maturation stages. Our results indicate that individual PNA subtypes are tied to specific processes. PNA bowl-type is linked to the recognition of rDNA damage on the nucleolar surface. The PNA of the funnel-type clusters several rDNA loci from the nucleolus into PML-NDS, which is the ultimate structure sequestering unrepaired rDNA away from the reactivated nucleolus.

      There is a negative correlation between doxorubicin dose and occurrence of PML-NDS, and, indeed, blocking HDR with BO2 combined with a lower doxorubicin dose results in a higher occurrence of all PNAs, including PML-NDS, emerged in the recovery phase. These findings indicate that the greater/more severe extent of rDNA damage, which is associated with RNAPI activity inhibition, is linked to PNAs types associated with RNAPI inhibition (originally published Imrichova et al. (doi: 10.18632/aging.102248.). In contrast, a milder degree of rDNA damage induces the formation of PMLNDS.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Critically, this task thus requires animals to estimate if at least 6 seconds have passed after the first nose poke - this is the key aspect of the task focused on here. After verifying that animals reliably estimate the passage of 6 seconds by leaving on average after 9 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2-MSNs increase activity, throughout this interval. They suggest that this activity follows a drift-diffusion model, in which activity increases (or decreases) to a threshold after which a decision (to leave) is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time of the animals to 10 seconds on average. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of

      'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition. 

      Major strengths: 

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The behavioral task used by the authors is quite interesting and a nice way to probe interval timing in rodents. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs; thus, this paper can meaningfully contribute to that conversation. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used. 

      We are glad our main points came through to the reviewer.  

      Major weaknesses: 

      I perceive two major weaknesses. The first is the impact or contextualization of their results in terms of the results of the field more broadly. More specifically, it was not clear to me how the authors are interpreting the striatal activity in the context of what others have observed during interval timing tasks. In other words - what was the hypothesis going into this experiment? Does observing increasing/decreasing activity in D2 versus D1 support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? Or was the main question that we didn't know if D2 or D1 neurons had differential activity during interval timing? 

      This is a helpful comment. Our hypothesis is that D1 and D2 MSNs had similar patterns of activity.  Our rationale is prior behavioral work from our group describing that blocking striatal D1 and D2 dopamine receptors had similar behavioral effects on interval timing (De Corte et al., 2019; Stutt et al., 2023), We rewrote our introduction with this idea in mind (Line 89)

      “We and others have found that striatal MSNs encode time across multiple intervals by time-dependent ramping activity or monotonic changes in firing rate across a temporal interval (Emmons et al., 2017; Gouvea et al., 2015; Mello et al., 2015; Wang et al., 2018). However, the respective roles of D2-MSNs and D1-MSNs are unknown. Past work has shown that disrupting either D2-dopamine receptors (D2) or D1-dopamine receptors (D1) powerfully impairs interval timing by increasing estimates of elapsed time (Drew et al., 2007; Meck, 2006). Similar behavioral effects were found with systemic (Stutt et al., 2024) or local dorsomedial striatal D2 or D1 disruption (De Corte et al., 2019a). These data lead to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval. 

      We tested this hypothesis with a combination of optogenetics, neuronal ensemble recording, computational modeling, and behavioral pharmacology. We use a well-described mouse-optimized interval timing task (Balci et al., 2008; Bruce et al., 2021; Larson et al., 2022; Stutt et al., 2024; Tosun et al., 2016; Weber et al., 2023). Strikingly, optogenetic tagging of D2-MSNs and D1-MSNs revealed distinct neuronal dynamics, with D2-MSNs tending to increase firing over an interval and D1-MSNs tending to decrease firing over the same interval, similar to opposing movement dynamics (Cruz et al., 2022; Kravitz et al., 2010; Tecuapetla et al., 2016). MSN dynamics helped construct and constrain a four-parameter drift-diffusion computational model of interval timing, which predicted that disrupting either D2MSNs or D1-MSNs would increase interval timing response times. Accordingly, we found that optogenetic inhibition of either D2-MSNs or D1-MSNs increased interval timing response times. Furthermore, pharmacological blockade of either D2- or D1receptors also increased response times and degraded trial-by-trial temporal decoding from MSN ensembles. Thus, D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either MSN type produced similar effects on behavior. These data demonstrate how striatal pathways play complementary roles in elementary cognitive operations and are highly relevant for understanding the pathophysiology of human diseases and therapies targeting the striatum.”

      In the second, I felt that some of the conclusions suggested by the authors don't seem entirely supported by the data they present, or the data presented suggests a slightly more complicated story. Below I provide additional detail on some of these instances. 

      Regarding the results presented in Figures 2 and 3: 

      I am not sure the PC analysis adds much to the interpretation, and potentially unnecessarily complicates things. In particular, running PCA on a matrix of noisy data that is smoothed with a Gaussian will often return PCs similar to what is observed by the authors, with the first PC being a line up/down, the 2nd PC being a parabola that is up/down, etc. Thus, I'm not sure that there is much to be interpreted by the specific shape of the PCs here. 

      We are glad the reviewer raised this point. First, regarding the components in noisy data, what the reviewer says is correct, but usually, the variance explained by PC1 is small. This is the reason we include scree plots in our PC analysis (Fig 3B and Fig 6G). When we compare our PC1s to variance explained in random data, our PC1 variance is always stronger. We have now included this in our manuscript:

      First, we generated random data and examined how much variance PC1 might generate. 

      We added this to the methods (Line 634)

      “The variance of PC1 was empirically compared against data generated from 1000 iterations of data from random timestamps with identical bins and kernel density estimates. Average plots were shown with Gaussian smoothing for plotting purposes only.”

      These data suggested that our PC1 was stronger than that observed in random data (Line 183):

      “PCA identified time-dependent ramping activity as PC1 (Fig 3A), a key temporal signal that explained 54% of variance among tagged MSNs (Fig 3B; variance for PC1 p = 0.009 vs 46 (44-49)% variance for PC1 derived from random data; Narayanan, 2016).”

      And in the pharmacology data (Line 367):

      “The first component (PC1), which explained 54% of neuronal variance, exhibited “time-dependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016).”

      Second, we note that we have used this analysis extensively in the past, and PC1 has always been identified as a linear ramping in our work and in work by others (Line 179):

      “Work by our group and others has uniformly identified PC1 as a linear component among corticostriatal neuronal ensembles during interval timing (Bruce et al., 2021; Emmons et al., 2020, 2019, 2017; Kim et al., 2017a; Narayanan et al., 2013; Narayanan and Laubach, 2009; Parker et al., 2014; Wang et al., 2018).”

      Third, we find that PC1 is highly correlated to the GLM slope (Line 205):

      “Trial-by-trial GLM slope was correlated with PC1 scores in Fig 3A-C (PC1 scores vs. GLM slope r = -0.60, p = 10-8).”

      Fourth, our goal was not to heavily interpret PC1 – but to compare D1 vs. D2 MSNs, or compare population responses to D2/D1 pharmacology. We have now made this clear in introducing PCA analyses in the results (Line 177):

      “To quantify differences in D2-MSNs vs D1-MSNs, we turned to principal component analysis (PCA), a data-driven tool to capture the diversity of neuronal activity (Kim et al., 2017a).”

      Finally, despite these arguments the reviewer’s point is well taken. Accordingly, we have removed all analyses of PC2 from the manuscript which may have been overly interpretative. 

      We have now removed language that interpreted the components, and we now find the discussion of PC1 much more data-driven. We have also removed much of the advanced PC analysis in Figure S9. Given our extensive past work using this exact analysis of PC1, we think PCA adds a considerable amount to our manuscript justified as the reviewer suggested. 

      I think an alternative analysis that might be both easier and more informative is to compute the slope of the activity of each neuron across the 6 seconds. This would allow the authors to quantify how many neurons increase or decrease their activity much like what is shown in Figure 2.  

      We agree – we now do exactly this analysis in Figure 3D. We now clarify this in detail, using the reviewer’s language to the methods (Line 648):

      “To measure time-related ramping over the first 6 seconds of the interval, we used trial-by-trial generalized linear models (GLMs) at the individual neuron level in which the response variable was firing rate and the predictor variable was time in the interval or nosepoke rate (Shimazaki and Shinomoto, 2007). For each neuron, it’s time-related “ramping” slope was derived from the GLM fit of firing rate vs time in the interval, for all trials per neuron. All GLMs were run at a trial-by-trial level to avoid effects of trial averaging (Latimer et al., 2015) as in our past work (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017b).”

      And to the results (Line 194):

      “To interrogate these dynamics at a trial-by-trial level, we calculated the linear slope of D2-MSN and D1-MSN activity over the first 6 seconds of each trial using generalized linear modeling (GLM) of effects of time in the interval vs trial-by-trial firing rate (Latimer et al., 2015).”

      Relatedly, it seems that the data shown in Figure 2D *doesn't* support the authors' main claim regarding D2/D1 MSNs increasing/decreasing their activity, as the trial-by-trial slope is near 0 for both cell types. 

      This likely refers to Figure 3D. The reviewer is correct that the changes in slope are small and near 0. Our goal was to show that D2-MSN and D1-MSN slopes were distinct – rather than increasing and decreasing. We have added this to the abstract (Line 46)

      “We found that D2-MSNs and D1-MSNs exhibited distinct dynamics over temporal intervals as quantified by principal component analyses and trial-by-trial generalized linear models.”

      We have clarified this idea in our hypothesis (Line 96):

      “These data led to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval.”

      We have added this idea to the results (Line 194)

      “To interrogate these dynamics at a trial-by-trial level, we calculated the linear slope of D2-MSN and D1-MSN activity over the first 6 seconds of each trial using generalized linear modeling (GLM) of effects of time in the interval vs trial-by-trial firing rate (Latimer et al., 2015). Nosepokes were included as a regressor for movement. GLM analysis also demonstrated that D2-MSNs had significantly different slopes (-0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1MSNs (-0.20 (-0.47– -0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)). We found that D2-MSNs and D1-MSNs had significantly different slopes even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F = 7.51, p = 0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F = 4.3, p = 0.04 accounting for variance between mice). Trial-by-trial GLM slope was correlated with PC1 scores in Fig 3A-C (PC1 scores vs. GLM slope r = -0.60, p = 108). These data demonstrate that D2-MSNs and D1-MSNs had distinct slopes of firing rate across the interval and were consistent with analyses of average activity and PC1, which exhibited time-related ramping.”

      And Line 215:

      “In summary, we used optogenetic tagging to record from D2-MSNs and D1-MSNs during interval timing. Analyses of average activity, PC1, and trial-by-trial firingrate slopes over the interval provide convergent evidence that D2-MSNs and D1MSNs had distinct and opposing dynamics during interval timing. These data provide insight into temporal processing by striatal MSNs.”

      And in the discussion (Line 415):

      “We describe how striatal MSNs work together in complementary ways to encode an elementary cognitive process, interval timing. Strikingly, optogenetic tagging showed that D2-MSNs and D1-MSNs had distinct dynamics during interval timing. “

      We have now included a new plot with box plots to make the differences in Figure 3D clear

      Other reviewers requested additional qualitative descriptions of our data, and we have referred to increases / decreases in this context. 

      Regarding the results in Figure 4: 

      The authors suggest that their data is consistent with a drift-diffusion model. However, it is unclear how well the output from the model fits the activity from neurons the authors recorded. Relatedly, it is unclear how the parameters were chosen for the D1/D2 versions of this model. I think that an alternate approach that would answer these questions is to fit the model to each cell, and then examine the best-fit parameters, as well as the ability of the model to predict activity on trials held out from the fitting process. This would provide a more rigorous method to identify the best parameters and would directly quantify how well the model captures the data. 

      We are glad the reviewer raised these points. Our goal was to use neuronal activity to fit behavioral activity, not the reverse. While we understand the reviewer’s point, we note that one behavioral output (switch time) can be encoded by many patterns of neuronal activity; thus, we are not sure we can use the model developed for behavior to fit diverse neuronal activity, or an ensemble of neurons. We have made this clear in the manuscript (Line 251):

      “Our model aimed to fit statistical properties of mouse behavioral responses while incorporating MSN network dynamics. However, the model does not attempt to fit individual neurons’ activity, because our model predicts a single behavioral parameter – switch time – that can be caused by the aggregation of diverse neuronal activity.”

      To attempt to do something close to what the reviewer suggested, we attempted to predict behavior directly from neuronal ensembles.  We have now made this clear in the methods on Line 682):

      “Analysis and modeling of mouse MSN-ensemble recordings. Our preliminary analysis found that, for sufficiently large number of neurons (𝑵 > 𝟏𝟏), each recorded ensemble of MSNs on a trial-by-trial basis could predict when mice would respond. We took the following approach: First, for each MSN, we convolved its trial-by-trial spike train 𝑺𝒑𝒌(𝒕) with a 1-second exponential kernel 𝑲(𝒕) = 𝒘 𝒆-𝒕/𝒘 if 𝒕 > 𝟎 and 𝑲(𝒕) = 𝟎 if 𝒕 ≤ 𝟎 (Zhou et al., 2018; here 𝒘 = 𝟏 𝒔). Therefore, the smoothed, convolved spiking activity of neuron 𝒋 (𝒋 = 𝟏, 𝟐, … 𝑵),

      tracks and accumulates the most recent (one second, in average) firing-rate history of the 𝒋-th MSN, up to moment 𝒕. We hypothesized that the ensemble activity

      (𝒙𝟏(𝒕), 𝒙𝟐(𝒕), … , 𝒙𝑵(𝒕)), weighted with some weights 𝜷𝒋 , could predict the trial switch time 𝒕∗ by considering the sum

      and the sigmoid 

      that approximates the firing rate of an output unit. Here parameter 𝒌   indicates how fast 𝒙(𝒕) crosses the threshold 0.5 coming from below (if 𝒌 > 𝟎) or coming from above (if 𝒌 < 𝟎) and relates the weights 𝜷𝒋 to the unknowns 𝜷H𝒋 \= 𝜷𝒋/𝒌 and 𝜷H𝟎 \= −𝟎. 𝟓/𝒌. Next, we ran a logistic fit for every trial for a given mouse over the spike count predictor matrix 7𝒙𝟏(𝒕), 𝒙𝟐(𝒕), … , 𝒙𝑵(𝒕)9 from the mouse MSN recorded ensemble, and observed value 𝒕∗, estimating the coefficients 𝜷H𝟎 and 𝜷H𝒋, and so, implicitly, the weights 𝜷𝒋. From there, we compute the predicted switch time 𝒕∗𝒑𝒓𝒆𝒅 by condition 𝒙(𝒕) = 𝟎. 𝟓. Accuracy was quantified comparing the predicted accuracy within a 1 second window to switch time on a trial-by-trial basis (Fig S4).

      And in the results (Line 254): 

      We first analyzed trial-based aggregated activity of MSN recordings from each mouse (𝒙𝒋(𝒕)) where 𝒋 = 𝟏, … , 𝑵 neurons. For D2-MSN or D1-MSN ensembles of 𝑵 > 𝟏𝟏, we found linear combinations of their neuronal activities, with some 𝜷𝒋 coefficients,

      that could predict the trial-by-trial switch response times (accuracy > 90%, Fig S4; compared with < 20% accuracy for Poisson-generated spikes of same trial-average firing rate). The predicted switch time 𝒕∗𝒑𝒓𝒆𝒅 was defined by the time when the weighted ensemble activity 𝒙(𝒕) first reached the value 𝒙) = 0.5. Finally, we built DDMs to account for this opposing trend (increasing vs decreasing) of MSN dynamics and for ensemble threshold behavior defining 𝒕∗𝒑𝒓𝒆𝒅; see the resulting model (Equations 1-3) and its simulations (Figure 4A-B).”

      And we have added a new figure, Figure S4, that demonstrates these trial-by-trial predictions of switch response times.  

      Note that we have included predictions from shuffled data similar to what the reviewer suggested based on shuffled data. Predictions are derived from neuronal ensembles on that trial; thus we could not apply a leave-one-out approach to trial-by-trial predictions.

      These models are highly predictive for larger ensembles and poorly predictive for smaller ensembles.  We think this model adds to the manuscript and we are glad the reviewer suggested it. 

      Relatedly, looking at the raw data in Figure 2, it seems that many neurons either fire at the beginning or end of the interval, with more neurons firing at the end, and more firing at the beginning, for D2/D1 neurons respectively. Thus, it's not clear to me whether the drift-diffusion model is a good model of activity. Or, perhaps the model is supposed to be related to the aggregate activity of all D1/D2 neurons? (If so, this should be made more explicit. The comment about fitting the model directly to the data also still stands).  

      Our model was inspired by the aggregate activity.  We have now made this clear in the results (Line 227): 

      “Our data demonstrate that D2-MSNs and D1-MSNs have opposite activity patterns. However, past computational models of interval timing have relied on drift-diffusion dynamics with a positive slope that accumulates evidence over time (Nguyen et al., 2020; Simen et al., 2011). To reconcile how these MSNs might complement to effect temporal control of action, we constructed a four-parameter drift-diffusion model (DDM). Our goal was to construct a DDM inspired by average differences in D2MSNs and D1-MSNs that predicted switch-response time behavior.”

      Further, it's unclear to me how, or why, the authors changed the specific parameters they used to model the optogenetic manipulation. Were these parameters chosen because they fit the manipulation data? This I don't think is in itself an issue, but perhaps should be clearly stated, because otherwise it sounds a bit odd given the parameter changes are so specific. It is also not clear to me why the noise in the diffusion process would be expected to change with increased inhibition. 

      We have clarified that our parameters were chosen to best fit behavior (Line 266):

      “The model’s parameters were chosen to fit the distribution of switch-response times:

      𝑭 = 𝟏, 𝒃 = 𝟎. 𝟓𝟐 (so 𝑻 = 𝟎. 𝟖𝟕), 𝑫 = 𝟎. 𝟏𝟑𝟓, 𝝈 = 𝟎. 𝟎𝟓𝟐 for intact D2-MSNs (Fig 4A, in black); and  𝑭 = 𝟎, 𝒃 = 𝟎. 𝟒𝟖 (so 𝑻 = 𝟎. 𝟏𝟑), 𝑫 = 𝟎. 𝟏𝟒𝟏, 𝝈 = 𝟎. 𝟎𝟓𝟐 for intact D1-MSNs (Fig 4B, in black).”

      Furthermore, we have clarified the approach to noise in the results (Line 247):  

      “The drift, together with noise 𝝃(𝒕) (of zero mean and strength 𝝈), leads to fluctuating accumulation which eventually crosses a threshold 𝑻 (see Equation 3; Fig 4A-B).”

      And Line 279: 

      “The results were obtained by simultaneously decreasing the drift rate D  (equivalent to lengthening the neurons’ integration time constant) and lowering the level of network noise 𝝈: D = 𝟎. 𝟏𝟐𝟗, 𝝈 = 𝟎. 𝟎𝟒𝟑 for D2-MSNs in Fig 4A (in red; changes in noise had to accompany changes in drift rate to preserve switch response time variance); and 𝑫 = 𝟎. 𝟏𝟐𝟐, 𝝈 = 𝟎. 𝟎𝟒𝟑  for D1-MSNs in Fig 4B (in blue). The model predicted that disrupting either D2-MSNs or D1-MSNs would increase switch response times (Fig 4C and Fig 4D) and would shift MSN dynamics.”

      Regarding the results in Figure 6: 

      My comments regarding the interpretation of PCs in Figure 2 apply here as well. In addition, I am not sure that examining PC2 adds much here, given that the authors didn't examine such nonlinear changes earlier in the paper. 

      We agree – we removed PC2 for these reasons. We have also noted that the primary reason for PC1 was to compare results of D2/D1 blockade (Line 362):

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together. The first component (PC1), which explained 54% of neuronal variance, exhibited “timedependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016).”

      As noted above, PC1 does not explain this level of variance in noisy data.

      We also reworked Figure 6 to make the effects of D2 and D1 blockade more apparent by moving the matched sorting to the main figure: 

      A larger concern though that seems potentially at odds with the authors' interpretation is that there seems to be very little change in the firing pattern after D1 or D2 blockade. I see that in Figure 6F the authors suggest that many cells slope down (and thus, presumably, they are recoding more D1 cells), and that this change in slope is decreased, but this effect is not apparent in Figure 6C, and Figure 6B shows an example of a cell that seems to fire in the opposite direction (increase activity). I think it would help to show some (more) individual examples that demonstrate the summary effect shown by the authors, and perhaps the authors can comment on the robustness (or the variability) of this result. 

      These are important suggestions, we changed our analysis to better capture the variability and main effects in the data, exactly as the reviewer suggested. First, we now included 3 individual raster examples, exactly as the reviewer suggested

      As the reviewer suggested, we wanted to compare variability for *all* MSNs. We sorted the same MSNs across saline, D2 blockade, and D1 blockade sessions. We detailed these sorting details in the methods (Line 618):

      “Single-unit recordings were made using a multi-electrode recording system (Open

      Ephys, Atlanta, GA). After the experiments, Plexon Offline Sorter (Plexon, Dallas, TX), was used to remove artifacts. Principal component analysis (PCA) and waveform shape were used for spike sorting. Single units were defined as those 1) having a consistent waveform shape, 2) being a separable cluster in PCA space, and 3) having a consistent refractory period of at least 2 milliseconds in interspike interval histograms. The same MSNs were sorted across saline, D2 blockade, and D1 blockade sessions by loading all sessions simultaneously in Offline Sorter and sorted using the preceding criteria. MSNs had to have consistent firing in all sessions to be included. Sorting integrity across sessions was quantified by comparing waveform similarity via correlation coefficients between sessions.”

      To confirm that we were able to track neurons across sessions, we quantified waveform similarity (Line 353):

      “We analyzed 99 MSNs in sessions with saline, D2 blockade, and D1 blockade. We matched MSNs across sessions based on waveform and interspike intervals; waveforms were highly similar across sessions (correlation coefficient between matched MSN waveforms: saline vs D2 blockade r = 1.00 (0.99 – 1.00 rank sum vs correlations in unmatched waveforms p = 3x10-44; waveforms; saline vs D1 blockade r = 1.00 (1.00 – 1.00), rank sum vs correlations in unmatched waveforms p = 4x10-50). There were no consistent changes in MSN average firing rate with D2 blockade or D1 blockade (F = 1.1, p = 0.30 accounting for variance between MSNs; saline: 5.2 (3.3 – 8.6) Hz; D2 blockade 5.1 (2.7 – 8.0) Hz; F = 2.2, p = 0.14; D1 blockade 4.9 (2.4 – 7.8) Hz).”

      As noted above, this enabled us to compare activity for the same MSNs across sessions in a new Figure 6 (previously, this analysis had been in Figure S9), and used PCA to quantify this variability.

      By tracking neurons across saline, D2 blockade, and D1 blockade, readers can see all the variability in MSNs. We added these data to the results (Line 362):  

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together. The first component (PC1), which explained 54% of neuronal variance, exhibited “timedependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016). Interestingly, PC1 scores shifted with D2 blockade (Fig 6F; PC1 scores for D2 blockade: -0.6 (-3.8 – 4.7) vs saline: -2.3 (-4.2 – 3.2), F = 5.1, p = 0.03 accounting for variance between MSNs; no reliable effect of sex (F = 0.2, p = 0.63) or switching direction (F = 2.8, p = 0.10)). PC1 scores also shifted with D1 blockade (Fig 6F; PC1 scores for D1 blockade: -0.0 (-3.9 – 4.5), F = 5.8, p = 0.02 accounting for variance between MSNs; no reliable effect of sex (F = 0.0, p = 0.93) or switching direction (F = 0.9, p = 0.34)). There were no reliable differences in PC1 scores between D2 and D1 blockade. Furthermore, PC1 was distinct even when sessions were sorted independently and assumed to be fully statistically independent (Figure S10; D2 blockade vs saline: F = 5.8, p = 0.02; D1 blockade vs saline: F = 4.9, p = 0.03; all analyses accounting for variance between mice). Higher components explained less variance and were not reliably different between saline and D2 blockade or D1 blockade. Taken together, this data-driven analysis shows that D2 and D1 blockade produced similar shifts in MSN population dynamics represented by PC1. When combined with the major contributions of D1/D2 MSNs to PC1 (Fig 3C) these findings indicate that pharmacological D2 blockade and D1 blockade disrupt ramping-related activity in the striatum.”

      Finally, we included the data in which sessions were sorted independently and assumed to be fully statistically independent in a new Figure S10.

      And in the results (Line 376): 

      “Furthermore, PC1 was distinct even when sessions were sorted independently and assumed to be fully statistically independent (Figure S10; D2 blockade vs saline: F = 5.8, p = 0.02; D1 blockade vs saline: F = 4.9, p = 0.03; all analyses accounting for variance between mice). Higher components explained less variance and were not reliably different between saline and D2 blockade or D1 blockade.”

      These changes strengthen the manuscript and better show the main effects and variability of the data. 

      Regarding the results in Figure 7: 

      I am overall a bit confused about what the authors are trying to claim here. In Figure 7, they present data suggesting that D1 or D2 blockade disrupts their ability to decode time in the interval of interest (0-6 seconds). However, in the final paragraph of the results, the authors seem to say that by using another technique, they didn't see any significant change in decoding accuracy after D1 or D2 blockade. What do the authors make of this? 

      This was very unclear. The second classifier was predicting response time, but it was confusing, and we removed it. 

      Impact: 

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding - that D2/D1 activity increases/ decreases with time - remains somewhat ambiguous to me. This arises from a lack of clarity regarding the initial hypothesis and the implications of this finding for advancing our understanding of striatal functions. 

      As noted above, we clarified our hypothesis and implications, and strengthened several aspects of the data as suggested by this reviewer.  

      Reviewer #2 (Public Review): 

      Summary: 

      In the present study, the authors investigated the neural coding mechanisms for D1- and D2expressing striatal direct and indirect pathway MSNs in interval timing by using multiple strategies. They concluded that D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either type produced similar effects on behavior, indicating the complementary roles of D1- and D2- MSNs in cognitive processing. However, the data was incomplete to fully support this major finding. One major reason is the heterogenetic responses within the D1-or D2MSN populations. In addition, there are additional concerns about the statistical methods used. For example, the majority of the statistical tests are based on the number of neurons, but not the number of mice. It appears that the statistical difference was due to the large sample size they used (n=32 D2-MSNs and n=41 D1-MSNs), but different neurons recorded in the same mouse cannot be treated as independent samples; they should use independent mouse-based statistical analysis. 

      Strengths: 

      The authors used multiple approaches including awake mice behavior training, optogeneticassistant cell-type specific recording, optogenetic or pharmacological manipulation, neural computation, and modeling to study neuronal coding for interval timing. 

      We appreciate the reviewer’s careful read recognizing the breadth of our approach.  

      Weaknesses: 

      (1) More detailed behavior results should be shown, including the rate of the success switches, and how long it takes to wait in the second nose poke to get a reward. For line 512 and the Figure 1 legend, the reviewer is not clear about the reward delivery. The methods appear to state that the mouse had to wait for 18s, then make nose pokes at the second port to get the reward. What happens if the mouse made the second nose poke before 18 seconds, but then exited? Would the mouse still get the reward at 18 seconds? Similarly, what happens if the mice made the third or more nosepokes within 18 seconds? It is important to clarify because, according to the method described, if the mice made a second nose poke before 18 seconds, this already counted as the mouse making the "switch." Lastly, what if the mice exited before 6s in the first nosepoke? 

      We completely agree. We have now completely revised Figure 1 to include many of these task details.

      We have clarified remaining details in the methods (Line 548):

      “Interval timing switch task. We used a mouse-optimized operant interval timing task described in detail previously (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). Briefly, mice were trained in sound-attenuating operant chambers, with two front nosepokes flanking either side of a food hopper on the front wall, and a third nosepoke located at the center of the back wall. The chamber was positioned below an 8-kHz, 72-dB speaker (Fig 1A; MedAssociates, St. Albans, VT). Mice were 85% food restricted and motivated with 20 mg sucrose pellets (BioServ, Flemington, NJ). Mice were initially trained to receive rewards during fixed ratio nosepoke response trials. Nosepoke entry and exit were captured by infrared beams. After shaping, mice were trained in the “switch” interval timing task. Mice self-initiated trials at the back nosepoke, after which tone and nosepoke lights were illuminated simultaneously. Cues were identical on all trial types and lasted the entire duration of the trial (6 or 18 seconds). On 50% of trials, mice were rewarded for a nosepoke after 6 seconds at the designated first ‘front’ nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking first at the ‘first’ nosepoke location and then switching to the ‘second’ nosepoke location; the reward was delivered for initial nosepokes at the second nosepoke location after 18 seconds when preceded by a nosepoke at the first nosepoke location.  Multiple nosepokes at each nosepokes were allowed. Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).

      Switch response time was defined as the moment animals departed the first nosepoke before arriving at the second nosepoke. Critically, switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepokes at the first location did not receive a reward after 6 seconds. That is, mice estimate if more than 6 seconds have elapsed without receiving a reward to decide to switch responses. Mice learn this task quickly (3-4 weeks), and error trials in which an animal nosepokes in the wrong order or does not nosepoke are relatively rare and discarded. Consequently, we focused on these switch response times as the key metric for temporal control of action. Traversal time was defined as the duration between first nosepoke exit and second nosepoke entry and is distinct from switch response time when animals departed the first nosepoke. Nosepoke duration was defined as the time between first nosepoke entry and exit for the switch response times only. Trials were self-initiated, but there was an intertrial interval with a geometric mean of 30 seconds between trials.”

      And in the results on Line 131: 

      “We investigated cognitive processing in the striatum using a well-described mouseoptimized interval timing task which requires mice to respond by switching between two nosepokes after a 6-second interval (Fig 1A; see Methods; (Balci et al., 2008; Bruce et al., 2021; Larson et al., 2022; Tosun et al., 2016; Weber et al., 2023)). In this task, mice initiate trials by responding at a back nosepoke, which triggers auditory and visual cues for the duration of the trial. On 50% of trials, mice were rewarded for nosepoking after 6 seconds at the designated ‘first’ front nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking at the ‘first’ nosepoke and then switching to the ‘second’ nosepoke; initial nosepokes at the second nosepoke after 18 seconds triggered reward when preceded by a first nosepoke. The first nosepokes occurred before switching responses and the second nosepokes occurred much later in the interval in anticipation of reward delivery at 18 seconds (Fig 1B-D). During the task, movement velocity peaked before 6 seconds as mice traveled to the front nosepoke (Fig 1E).

      We focused on the switch response time, defined as the moment mice exited the first nosepoke before entering the second nosepoke. Switch responses are a timebased decision guided by temporal control of action because mice switch nosepokes only if nosepoking at the first nosepokes does not lead to a reward after 6 seconds (Fig 1B-E). Switch responses are guided by internal estimates of time because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses. In 30 mice, switch response times were 9.3 seconds (8.4 – 9.7; median (IQR)); see Table 1 for a summary of mice, experiments, trials, and sessions). We studied dorsomedial striatal D2-MSNs and D1-MSNs using a combination of optogenetics and neuronal ensemble recordings in 9 transgenic mice (4 D2-Cre mice switch response time 9.7 (7.0 – 10.3) seconds; 5 D1-Cre mice switch response time 8.2 (7.7 – 8.7) seconds; rank sum p = 0.73; Table 1).”

      (2) There are a lot of time parameters in this behavior task, the description of those time parameters is mentioned in several parts, in the figure legend, supplementary figure legend, and methods, but was not defined clearly in the main text. It is inconvenient, sometimes, confusing for the readers. The authors should make a schematic diagram to illustrate the major parameters and describe them clearly in the main text. 

      We agree. We have clarified this in a new schematic, shading the interval in gray:   

      And in the results on line 131:

      “We focused on the switch response time, defined as the moment mice exited the first nosepoke before entering the second nosepoke. Switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepoking at the first nosepokes does not lead to a reward after 6 seconds (Fig 1BE). Switch responses are guided by internal estimates of time because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses. In 30 mice, switch response times were 9.3 seconds (8.4 – 9.7; median (IQR)); see Table 1 for a summary of mice, experiments, trials, and sessions). We studied dorsomedial striatal D2-MSNs and D1-MSNs using a combination of optogenetics and neuronal ensemble recordings in 9 transgenic mice (4 D2-Cre mice switch response time 9.7

      (7.0 – 10.3) seconds; 5 D1-Cre mice switch response time 8.2 (7.7 – 8.7) seconds; rank sum p = 0.73; Table 1).”

      (3) In Line 508, the reviewer suggests the authors pay attention to those trials without "switch". It would be valuable to compare the MSN activity between those trials with or without a "switch". 

      This is a great suggestion. We analyzed such error trials and MSN activity in Figure 6 of Bruce et al., 2021. However, this manuscript was not designed to analyze errors, as they are rare beyond initial training (Bruce et al., 2021 focused on early training), and too inconsistent to permit robust analysis. This was added to the methods on Line 567:

      “Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).”

      (4) The definition of interval is not very clear. It appears that the authors used a 6-second interval in analyzing the data in Figure 2 and Figure 3. But from my understanding, the interval should be the time from time "0" to the "switch", when the mice start to exit from the first nose poke. 

      We have now defined it explicitly in the schematic: 

      Incidentally, this reviewer asked us to analyze a longer epoch – this analysis beautifully justifies our focus on the first 6 seconds (now in Figure S2).

      We focus on the first six seconds as there are few nosepokes and switch responses during this epoch; however, we consider the reviewer’s definition and analyze the epoch the reviewer suggests from 0 to the switch in analyses below. 

      (5) For Figure 2 C-F, the authors only recorded 32 D2-MSNs in 4 mice, and 41 D1-MSNs in 5 mice. The sample size is too small compared to the sample size usually used in the field. In addition to the small sample size, the single-cell activity exhibited heterogeneity, which created potential issues. 

      We are glad the reviewer raised these points. First, our tagging dataset is relatively standard for optogenetic tagging. Second, we now include Cohen’s d for both PC and slope results for all optogenetic tagging analysis, which demonstrate that we have adequate statistical power and medium-to-large effect sizes (Line 186): 

      “In line with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      We added boxplots to Figure 3, which better highlight differences in these distributions.

      However, the reviewer’s point is well-taken, and we have added a caveat to the discussion exactly as the reviewer suggested (Line 496):

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      For both D1 and D2 MSNs, the authors tried to make conclusions on the "trend" of increasing in D2-MSNs and decreasing in D1-MSNs populations, respectively, during the interval. However, such a conclusion is not sufficiently supported by the data presented. It looks like the single-cell activity patterns can be separated into groups: one is a decreasing activity group, one is an increasing activity group and a small group for on and off response. Because of the small sample size, the author should pay attention to the variance across different mice (which needs to be clearly presented in the manuscript), instead of pooling data together and analyzing the mean activity. 

      We were not clear – we now do exactly as the reviewer suggested. We are not pooling any data – instead – as we state on line 620 - we are using linear-mixed effects models to account for mouse-specific and neuron-specific variance. This approach was developed with our statistics core for exactly the reasons the reviewer suggested (see letter). We state this explicitly in the methods (Line 704):

      “Statistics. All data and statistical approaches were reviewed by the Biostatistics,

      Epidemiology, and Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa. All code and data are made available at http://narayanan.lab.uiowa.edu/article/datasets. We used the median to measure central tendency and the interquartile range to measure spread. We used Wilcoxon nonparametric tests to compare behavior between experimental conditions and Cohen’s d to calculate effect size. Analyses of putative single-unit activity and basic physiological properties were carried out using custom routines for MATLAB.

      For all neuronal analyses, variability between animals was accounted for using generalized linear-mixed effects models and incorporating a random effect for each mouse into the model, which allows us to account for inherent between-mouse variability. We used fitglme in MATLAB and verified main effects using lmer in R. We accounted for variability between MSNs in pharmacological datasets in which we could match MSNs between saline, D2 blockade, and D1 blockade. P values < 0.05 were interpreted as significant.”

      We have now stated in the results that we are explicitly accounting for variance between mice (Line 186): 

      “In line with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And on Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      All statistics in the manuscript now explicitly account for variance between mice. 

      This is the approach that was recommended by our the Biostatistics, Epidemiology, and

      Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa, who reviews all of our work.

      We note that these Cohen d values usually interpret as medium or large. 

      We performed statistical power calculations and include these to aid readers’ interpretation. These are all >0.8. 

      Finally, the reviewer uses the word ‘trend’. We define p values <0.05 as significant in the methods, and do not interpret trends (on line 717): 

      “P values < 0.05 were interpreted as significant.”

      And, we have now plotted values for each mouse in a new Figure S3.

      As noted in the figure legend, mouse-specific effects were analyzed using linear models that account for between-mouse variability, as discussed with our statisticians. However, the reviewer’s point is well taken, and we have added this idea to the discussion as suggested (Line 496):

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      (6) For Figure 2, from the activity in E and F, it seems that the activity already rose before the trial started, the authors should add some longer baseline data before time zero for clarification and comparison and show the timing of the actual start of the activity with the corresponding behavior. What behavior states are the mice in when initiating the activity? 

      This is a key point. First, we are not certain what state the animal is in until they initiate trials at the back nosepoke (“Start”). Therefore, we cannot analyze this epoch.  

      However, we can show neuronal activity during a longer epoch exactly as the reviewer suggested. Although there are modulations, the biggest difference between D2 and D1 MSNs is during the 0-6 second interval. This analysis supports our focus on the 0-6 second interval. We have included this as a new Figure S2.

      (7) The authors were focused on the "switch " behavior in the task, but they used an arbitrary 6s time window to analyze the activity, and tried to correlate the decreasing or increasing activities of MSNs to the neural coding for time. A better way to analyze is to sort the activity according to the "switch" time, from short to long intervals. This way, the authors could see and analyze whether the activity of D1 or D2 MSNs really codes for the different length of interval, instead of finding a correlation between average activity trends and the arbitrary 6s time window. 

      This is a great suggestion. We did exactly this and adjusted our linear models on a trialby-trial basis to account for time between the start of the interval and the switch. This is now added to the methods (line 656): 

      “We performed additional sensitivity analysis excluding outliers and measuring firing rate from the start of the interval to the time of the switch response on a trialby-trial level for each neuron.”

      And to the results (Line 201):

      “We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      We now state our justification for focusing on the first 6 seconds of the interval (Line 134)

      “Switch responses are guided by internal estimates of time and temporal control of action because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses.”

      As noted previously, epoch is now justified by Figure S2E.

      And we note that this focus minimizes motor confounds (Line 511):

      “Four lines of evidence argue that our findings cannot be directly explained by motor confounds: 1) D2-MSNs and D1-MSNs diverge between 0-6 seconds after trial start well before the first nosepoke (Fig S2), 2) our GLM accounted for nosepokes and nosepoke-related βs were similar between D2-MSNs and D1-MSNs, 3) optogenetic disruption of dorsomedial D2-MSNs and D1-MSNs did not change task-specific movements despite reliable changes in switch response time, and 4) ramping dynamics were quite distinct from movement dynamics. Furthermore, disrupting D2-MSNs and D1-MSNs did not change the number of rewards animals received, implying that these disruptions did not grossly affect motivation. Still, future work combining motion tracking with neuronal ensemble recording and optogenetics and including bisection tasks may further unravel timing vs. movement in MSN dynamics (Robbe, 2023).”

      We are glad the reviewer suggested this analysis as it strengthens our manuscript.  

      Reviewer #3 (Public Review): 

      Summary: 

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using a range of causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions. 

      Strengths: 

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model. 

      We are grateful for the reviewer’s consideration of our work and for recognizing the strengths of our approach.  

      Weaknesses: 

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals.

      This is a key point, and the reviewer is correct. We use our task because of its’ translational validity; as far as we know, temporal bisection tasks have been used less often in human disease and in rodent models. We have included a new paragraph describing this in the discussion (Line 472):

      “Because interval timing is reliably disrupted in human diseases of the striatum such as Huntington’s disease, Parkinson’s disease, and schizophrenia (Hinton et al., 2007; Singh et al., 2021; Ward et al., 2011), these results have relevance to human disease. Our task version has been used extensively to study interval timing in mice and humans (Balci et al., 2008; Bruce et al., 2021; Stutt et al., 2024; Tosun et al., 2016; Weber et al., 2023). However, temporal bisection tasks, in which animals hold during a temporal cue and respond at different locations depending on cue length, have advantages in studying how animals time an interval because animals are not moving while estimating cue duration (Paton and Buonomano, 2018; Robbe, 2023; Soares et al., 2016). Our interval timing task version – in which mice switch between two response nosepokes to indicate their interval estimate has elapsed – has been used extensively in rodent models of neurodegenerative disease (Larson et al., 2022; Weber et al., 2024, 2023; Zhang et al., 2021), as well as in humans (Stutt et al., 2024). Furthermore, because many therapeutics targeting dopamine receptors are used clinically, these findings help describe how dopaminergic drugs might affect cognitive function and dysfunction. Future studies of D2-MSNs and D1-MSNs in temporal bisection and other timing tasks may further clarify the relative roles of D2- and D1-MSNs in interval timing and time estimation.”

      Furthermore, we have modified the use of the definition of interval timing in the abstract, introduction, and results to reflect the reviewers comment. For instance, in the abstract (Line 43):

      “We studied dorsomedial striatal cognitive processing during interval timing, an elementary cognitive task that requires mice to estimate intervals of several seconds and involves working memory for temporal rules as well as attention to the passage of time.”

      However, we think it is important to use the term ‘interval timing’ as it links to past work by our group and others.   

      The main results from unit recording (opposing slopes of D1/D2 cell firing rate, as shown in Figure 3D) appear to be very sensitive to a couple of outlier cells, and the predictive power of ensemble recording seems to be only slightly above chance levels. 

      This is a key point raised by other reviewers as well. We have now included measures of statistical power (as we interpret the reviewer’s comment of predictive power), effect size, and perform additional sensitivity analyses (Line 187): 

      “PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-4.9 – -2.8); F=8.8, p = 0.004 accounting for variance between mice (Fig S3A);  Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F=1.9, p=0.17) or switching direction (F=0.1, p=0.75)).”

      And on Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.45– 0.06; Fig 3D; F=8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98).  We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      These are medium-to-large Cohen’s d results, and we have adequate statistical power. These results are not easily explained by chance. 

      We also added boxplots, which highlight the differences in distribution.

      Finally, we note that our conclusions are drawn from many convergent analyses (on Line 216): 

      “Analyses of average activity, PC1, and trial-by-trial firing-rate slopes over the interval provide convergent evidence that D2-MSNs and D1-MSNs had distinct and opposing dynamics during interval timing.”

      In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs. 

      This is an important point. We are well aware of heating effects with optogenetics and other potential confounds. For the exact reasons noted by the reviewer, we had opsinnegative controls – where the laser was on for the exact same amount of time (18 seconds) and at the same power (12 mW)– in Figure S5. We have now better highlighted these controls in the methods (Line 598):

      “In animals injected with optogenetic viruses, optical inhibition was delivered via bilateral patch cables for the entire trial duration of 18 seconds via 589-nm laser light at 12 mW power on 50% of randomly assigned trials. We performed control experiments in mice without opsins using identical laser parameters in D2-cre or D1-cre mice (Fig S6).”

      And in results (Line 298):

      “Importantly, we found no reliable effects for D2-MSNs with opsin-negative controls (Fig S6).”

      And Line 306): 

      “As with D2-MSNs, we found no reliable effects with opsin-negative controls in D1MSNs (Fig S6).”

      We have highlighted these data in Figure S6: 

      Furthermore, the effect of optogenetic inhibition is similar to pharmacological effects in this manuscript and in our prior work (De Corte et al., 2019; Stutt et al., 2024) on line 459): 

      “Past pharmacological work from our group and others has shown that disrupting D2- or D1-MSNs slows timing (De Corte et al., 2019b; Drew et al., 2007, 2003; Stutt et al., 2024), in line with pharmacological and optogenetic results in this manuscript.”

      And in the discussion section on Line 488: 

      “Our approach has several limitations. First, systemic drug injections block D2- and D1-receptors in many different brain regions, including the frontal cortex, which is involved in interval timing (Kim et al., 2017a). D2 blockade or D1 blockade may have complex effects, including corticostriatal or network effects that contribute to changes in D2-MSN or D1-MSN ensemble activity. We note that optogenetic inhibition of D2-MSNs and D1-MSNs produces similar effects to pharmacology in Figure 5.”

      Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum. 

      This is a great point - we did this experiment in De Corte et al, 2019 with local drug infusions. This earlier study was the departure point for this experiment. We now point this out in the introduction (Line 92): 

      “Past work has shown that disrupting either D2-dopamine receptors (D2) or D1dopamine receptors (D1) powerfully impairs interval timing by increasing estimates of elapsed time (Drew et al., 2007; Meck, 2006). Similar behavioral effects were found with systemic (Stutt et al., 2024) or local dorsomedial striatal D2 or D1 disruption (De Corte et al., 2019a). These data lead to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval.”

      However, the reviewer makes a great point - and we will develop this in our future work (Line 485): 

      “Future studies might extend our work combining local pharmacology with neuronal ensemble recording.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Just a few minor notes: 

      (1) Figures 2C and D should have error bars. 

      We agree.  We added error bars to these figures and other rasters as recommended.  

      (2) Figures 2G and H seem to be smoothed - how was this done? 

      We added these details.

      (3) It is unclear what the 'neural network machine learning classifier' mentioned in lines 193-199 adds if the data relevant to this analysis isn't presented. I would potentially include this. 

      We agree. This analysis was confusing and not relevant to our main points; consequently, we removed it.  

      Reviewer #2 (Recommendations For The Authors): 

      Major: 

      (1)  For Figure 2, the description of the main results in (C-F) in the main text is too brief and is not clear. 

      We have added to and clarified this text (Line 147)

      “Striatal neuronal populations are largely composed of MSNs expressing D2dopamine or D1-dopamine receptors. We optogenetically tagged D2-MSNs and D1MSNs by implanting optrodes in the dorsomedial striatum and conditionally expressing channelrhodopsin (ChR2; Fig S1) in 4 D2-Cre (2 female) and 5 D1-Cre transgenic mice (2 female). This approach expressed ChR2 in D2-MSNs or D1MSNs, respectively (Fig 2A-B; Kim et al., 2017a). We identified D2-MSNs or D1MSNs by their response to brief pulses of 473 nm light; neurons that fired within 5 milliseconds were considered optically tagged putative D2-MSNs (Fig S1B-C). We tagged 32 putative D2-MSNs and 41 putative D1-MSNs in a single recording session during interval timing. There were no consistent differences in overall firing rate between D2-MSNs and D1-MSNs (D2-MSNs: 3.4 (1.4 – 7.2) Hz; D1-MSNs 5.2 (3.1 – 8.6) Hz; F = 2.7, p = 0.11 accounting for variance between mice). Peri-event rasters and histograms from a tagged putative D2-MSN (Fig 2C) and from a tagged putative D1-MSN (Fig 2D) demonstrate prominent modulations for the first 6 seconds of the interval after trial start. Z-scores of average peri-event time histograms (PETHs) from 0 to 6 seconds after trial start for each putative D2-MSN are shown in Fig 2E and for each putative D1-MSN in Fig 2F. These PETHs revealed that for the 6-second interval immediately after trial start, many putative D2-MSN neurons appeared to ramp up while many putative D1-MSNs appeared to ramp down. For 32 putative D2-MSNs average PETH activity increased over the 6second interval immediately after trial start, whereas for 41 putative D1-MSNs, average PETH activity decreased. These differences resulted in distinct activity early in the interval (0-1 seconds; F = 6.0, p = 0.02 accounting for variance between mice), but not late in the interval (5-6 seconds; F = 1.9, p = 0.17 accounting for variance between mice) between D2-MSNs and D1-MSNs. Examination of a longer interval of 10 seconds before to 18 seconds after trial start revealed the greatest separation in D2-MSN and D1-MSN dynamics during the 6-second interval after trial start (Fig S2). Strikingly, these data suggest that D2-MSNs and D1-MSNs might display opposite dynamics during interval timing.”

      (2)  For Figure3 

      (A)  Is the PC1 calculated from all MSNs of all mice (4 D2, 5 D1 mice)? 

      We clarified this (Line 182):

      “We analyzed PCA calculated from all D2-MSNs and D1-MSNs PETHs over the 6second interval immediately after trial start.”

      And for pharmacology (Line 362): 

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together.”

      (B)  The authors should perform PCA on single mouse data, and add the plot and error bar. 

      This is a great idea. We have now included this as a new Figure S3:   

      (C)  As mentioned before, both D2-or D1- MSNs can be divided into three groups, it is not appropriate to put them together as each MSN is not an independent variable, the authors should do the statistics based on the individual mouse, and do the parametric or non-parametric comparison, and plot N (number of mice) based error bars. 

      We have done exactly this using a linear mixed effects model, as recommend by our statistics core. They have explicitly suggested that this is the best approach to these data (see letter). We have also included measures of statistical power and effect size (Line 704):  

      “All data and statistical approaches were reviewed by the Biostatistics, Epidemiology, and Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa. All code and data are made available at http://narayanan.lab.uiowa.edu/article/datasets. We used the median to measure central tendency and the interquartile range to measure spread. We used Wilcoxon nonparametric tests to compare behavior between experimental conditions and Cohen’s d to calculate effect size. Analyses of putative single-unit activity and basic physiological properties were carried out using custom routines for MATLAB.

      For all neuronal analyses, variability between animals was accounted for using generalized linear-mixed effects models and incorporating a random effect for each mouse into the model, which allows to account for inherent between-mouse variability. We used fitglme in MATLAB and verified main effects using lmer in R. We accounted for variability between MSNs in pharmacological datasets in which we could match MSNs between saline, D2 blockade, and D1 blockade. P values < 0.05 were interpreted as significant.”

      We have now included measures of ‘power’ (which we interpret to be statistical), effect size, and perform additional sensitivity analyses (Line 187): 

      “PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-4.9 – -2.8); F=8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F=1.9, p=0.17) or switching direction (F=0.1, p=0.75)).”

      And Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.45– 0.06; Fig 3D; F=8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98).  We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial bases for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      These are medium-to-large Cohen’s d results, and we have adequate statistical power. These results are not easily explained by chance. 

      We also added boxplots, which highlight the differences in distributions.

      (3) For results in Figure 5 and Figure S7, according to Figure 1 legend, lines 4 to 5, the response times were defined as the moment mice exit the first nose poke (on the left) to respond at the second nose poke; and according to method session (line 522), "switch" traversal time was defined as the duration between first nose poke exit and second nose poke entry. It seems that response time is the switch traversal time, they should be the same, but in Figures B and D, the response time showed a clear difference between the laser off and on groups, while in Figures S7 C, and G, there were no differences between laser off and on group for switch traversal time. Please reconcile these inconsistencies. 

      We were not clear. We now clarify – switch responses are the moment when mice depart the first nosepoke, whereas traversal time is the time between departing the first nosepoke and arriving at the second nosepoke. We have reworked our figures to make this clear.

      And in the methods (Line 570):

      “Switch response time was defined as the moment animals departed the first nosepoke before arriving at the second nosepoke. Critically, switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepokes at the first location did not receive a reward after 6 seconds. That is, mice estimate if more than 6 seconds have elapsed without receiving a reward to decide to switch responses. Mice learn this task quickly (3-4 weeks), and error trials in which an animal nosepokes in the wrong order or does not nosepoke are relatively rare and discarded. Consequently, we focused on these switch response times as the key metric for temporal control of action. Traversal time was defined as the duration between first nosepoke exit and second nosepoke entry and is distinct from switch response time when animals departed the first nosepoke. Nosepoke duration was defined as the time between first nosepoke entry and exit for the switch response times only. Trials were self-initiated, but there was an intertrial interval with a geometric mean of 30 seconds between trials.”

      And in Figure S8, we have added graphics and clarified the legend.

      (4) The first nose poke and second nose poke are very close, why did it take so long to move from the first nose poke to the second nose poke, even though the mouse already made the decision to switch? Please see Figure S1A, it took less than 6s from the back nose poke to the first nose poke, but it took more than 6s (up to 12s) from the first nose poke to the second nose poke, what were the mice's behavior during this period? 

      This is a key detail. There is no temporal urgency as only the initial nosepoke after 18 seconds leads to reward. In other words, making a second nosepoke prior to 18 seconds is not rewarded and, in well-trained animals, is wasted effort. We have added these details to the methods (Line 124):

      “On the remaining 50% of trials, mice were rewarded for nosepoking at the ‘first’ nosepoke and then switching to the ‘second’ nosepoke; initial nosepokes at the second nosepoke after 18 seconds triggered reward when preceded by a first nosepoke. The first nosepokes occurred before switching responses and the second nosepokes occurred much later in the interval in anticipation of reward delivery at 18 seconds (Fig 1B-D). During the task, movement velocity peaked before 6 seconds as mice traveled to the front nosepoke (Fig 1E).”

      And in Figure 1, as described in detail above. 

      (5) How many trials did mice perform in one day? How many recordings/day for how many days were performed? 

      These are key details that we have now added to Table 1.

      We have added the number of recording sessions to the methods (Line 603): 

      “For optogenetic tagging, putative D1- and D2-MSNs were optically identified via 473-nm photostimulation. Units with mean post-stimulation spike latencies of ≤5 milliseconds and a stimulated-to-unstimulated waveform correlation ratio of >0.9 were classified as putative D2-MSNs or D1-MSNs (Ryan et al., 2018; Shin et al., 2018). Only one recording session was performed for each animal per day, and one recording session was included from each animal.”

      And Line 606: 

      “Only one recording session was performed for each animal per day, and one recording session was included from saline, D2 blockade, and D1 blockade sessions.”

      (6) For results in Figure 5, the authors should analyze the speed for the laser on and off group, since the dorsomedial striatum was reported to be related to control of speed (Yttri, Eric A., and Joshua T. Dudman. "Opponent and bidirectional control of movement velocity in the basal ganglia." Nature 533.7603 (2016): 402-406.). 

      We have some initial DeepLabCut data and have included it in a new Figure 1E.

      B) DeepLabCut tracking of position during the interval timing revealed that mice moved quickly after trial start and then velocity was relatively constant throughout the trial

      We measure movement speed using nosepoke duration and traversal time, which can give some measure of movement velocity.

      In Yttri and Dudman, the mice are head-fixed and moving a joystick, whereas our mice are freely moving. However, we have now included the lack of motor control as a major limitation (Line 510): 

      “Finally, movement and motivation contribute to MSN dynamics (Robbe, 2023). Four lines of evidence argue that our findings cannot be directly explained by motor confounds: 1) D2-MSNs and D1-MSNs diverge between 0-6 seconds after trial start well before the first nosepoke (Fig S2), 2) our GLM accounted for nosepokes and nosepoke-related βs were similar between D2-MSNs and D1-MSNs, 3) optogenetic disruption of dorsomedial D2-MSNs and D1-MSNs did not change task-specific movements despite reliable changes in switch response time, and 4) ramping dynamics were quite distinct from movement dynamics. Furthermore, disrupting D2-MSNs and D1-MSNs did not change the number of rewards animals received, implying that these disruptions did not grossly affect motivation. Still, future work combining motion tracking with neuronal ensemble recording and optogenetics and including bisection tasks may further unravel timing vs. movement in MSN dynamics (Robbe, 2023).”

      (7)  Figure S3 (C, E, and F), statistics should be done based on N (number of mice), not on the number of recorded neurons.  

      We have removed this section, and all other statistics in the paper properly account for mouse-specific variance, as noted above.

      (8)  Figure S1 

      (A) Are these the results from all mice superposed together, or from one mouse on one given day? How many of the trials' data were superposed?

      We included these details in a new Figure 1.

      (B, C) How many trials were included? 

      (D) How many days did these data cover? 

      We have included a new Table 1 with these important details.

      We have noted that only 1 recording session / mouse was included in analysis (Line 606):

      “Only one recording session was performed for each animal per day, and one recording session was included from each animal.”

      And Line 614: 

      “Only one recording session was performed for each animal per day, and one recording session was included from saline, D2 blockade, and D1 blockade sessions.”

      (9) Figure S2 

      (A) Can the authors add coordinates of the brain according to the mouse brain atlas or, alternatively, show it using a coronal section? 

      Great idea – added to Figure S2 legend: 

      “Figure S1: A) Recording locations in the dorsomedial striatum (targeting AP +0.4, ML -1.4, DV -2.7). Electrode reconstructions for D2-Cre (red), D1-Cre (blue), and wild-type mice (green). Only the left striatum was implanted with electrodes in all animals.”

      We have also added it to Figure S5 legend: 

      “Figure S5: Fiber optic locations from A) an opsin-expressing mouse with mCherrytagged halorhodopsin and bilateral fiber optics, and B) across 10 D2-Cre mice (red) and 6 D1-cre mice (blue) with fiber optics (targeting AP +0.9, ML +/-1.3, DV –2.5).”

      (C) Why did the waveform of laser and no laser seem the same? 

      The optogenetically tagged spike waveforms are highly similar, indicating that optogenetically-triggered spikes are like other spikes. That is the main point – optogenetically stimulating the neuron does not change the waveform. We have added this detail to the legend of S1: 

      “Inset on bottom right – waveforms from laser trials (red) and trials without laser (blue).  Across 73 tagged neurons, waveform correlation coefficients for laser trials vs. trials without laser was r = 0.97 (0.92-0.99). These data demonstrate that optogenetically triggered spikes are similar to non-optogenetically triggered spikes.”

      (10)  Figure S7, what was the laser power used in this experiment? Have the authors tried different laser powers? 

      We have now clarified the laser power on line 598: 

      “In animals injected with optogenetic viruses, optical inhibition was delivered via bilateral patch cables for the entire trial duration of 18 seconds via 589-nm laser light at 12 mW power on 50% of randomly assigned trials.”

      And for Figure S6 (was S7 previously): 

      We did not try other laser powers; our parameters were chosen a priori based on our past work.  

      (11)  In Figure S9, what method was used to sort the neurons? 

      We now clarify in the methods (Line 617): 

      “Electrophysiology. Single-unit recordings were made using a multi-electrode recording system (Open Ephys, Atlanta, GA). After the experiments, Plexon Offline Sorter (Plexon, Dallas, TX), was used to remove artifacts. Principal component analysis (PCA) and waveform shape were used for spike sorting. Single units were defined as those 1) having a consistent waveform shape, 2) being a separable cluster in PCA space, and 3) having a consistent refractory period of at least 2 milliseconds in interspike interval histograms.  The same MSNs were sorted across saline, D2 blockade, and D1 blockade sessions by loading all sessions simultaneously in Offline Sorter and sorted using the preceding criteria. MSNs had to have consistent firing in all sessions to be included. Sorting integrity across sessions was quantified by comparing waveform similarity via R2 between sessions.”

      And in the results (Line 353):

      “We analyzed 99 MSNs in sessions with saline, D2 blockade, and D1 blockade. We matched MSNs across sessions based on waveform and interspike intervals; waveforms were highly similar across sessions (correlation coefficient between matched MSN waveforms: saline vs D2 blockade r = 1.00 (0.99 – 1.00 rank sum vs correlations in unmatched waveforms p = 3x10-44; waveforms; saline vs D1 blockade r = 1.00 (1.00 – 1.00), rank sum vs correlations in unmatched waveforms p = 4x10-50). There were no consistent changes in MSN average firing rate with D2 blockade or D1 blockade (F = 1.1, p = 0.30 accounting for variance between MSNs; saline: 5.2 (3.3 – 8.6) Hz; D2 blockade 5.1 (2.7 – 8.0) Hz; F = 2.2, p = 0.14; D1 blockade 4.9 (2.4 – 7.8) Hz).”

      (C-F) statistics should be done based on the number of mice, not on the number of recorded neurons. 

      We agree, all experiments are now quantified using linear mixed effects models which formally accounts for variance contributed across animals, as discussed at length earlier in the review and with statistical experts at the University of Iowa.

      (12) For results in Figure 6, did the authors do cell-type specific recording on D1 or D2 MSNs using optogenetic tagging? As the D1- or D2- MSNs account for ~50% of all MSNs, the inhibition of a considerable amount of neurons was not observed. The authors should discuss the relation between the results from optogenetic inhibition of D1- or D2- MSNs and pharmacological disruption of D1 or D2 dopamine receptors. 

      This is a great point. First, we did not combine cell-type specific recordings with tagging as it was difficult to get enough trials for analysis in a single session in the tagging experiments, and pharmacological interventions can further decrease performance.  However, we have made our results in Figure 6 much more focused.

      We have discussed the relationship between these data in the results (Line 380): 

      “This data-driven analysis shows that D2 and D1 blockade produced similar shifts in MSN population dynamics represented by PC1.  When combined with major contributions of D1/D2 MSNs to PC1 (Fig 3C) these findings show that pharmacologically disrupting D2 or D1 MSNs can disrupt ramping-related activity in the striatum.”

      And in the discussion (Line 417): 

      “Strikingly, optogenetic tagging showed that D2-MSNs and D1-MSNs had distinct dynamics during interval timing. MSN dynamics helped construct and constrain a four-parameter drift-diffusion model in which D2- and D1-MSN spiking accumulated temporal evidence. This model predicted that disrupting either D2MSNs or D1-MSNs would increase response times. Accordingly, we found that optogenetically or pharmacologically disrupting striatal D2-MSNs or D1-MSNs increased response times without affecting task-specific movements. Disrupting D2MSNs or D1-MSNs shifted MSN temporal dynamics and degraded MSN temporal encoding. These data, when combined with our model predictions, demonstrate that D2-MSNs and D1-MSNs contribute temporal evidence to controlling actions in time.”

      And: 

      “D2-MSNs and D1-MSNs play complementary roles in movement. For instance, stimulating D1-MSNs facilitates movement, whereas stimulating D2-MSNs impairs movement (Kravitz et al., 2010). Both populations have been shown to have complementary patterns of activity during movements (Tecuapetla et al., 2016), with MSNs firing at different phases of action initiation and selection. Further dissection of action selection programs reveals that opposing patterns of activation among D2MSNs and D1-MSNs suppress and guide actions, respectively, in the dorsolateral striatum (Cruz et al., 2022). A particular advantage of interval timing is that it captures a cognitive behavior within a single dimension — time. When projected along the temporal dimension, it was surprising that D2-MSNs and D1-MSNs had opposing patterns of activity. Past pharmacological work from our group and others have shown that disrupting D2 or D1 MSNs slows timing (De Corte et al., 2019; Drew et al., 2007, 2003; Stutt et al., 2023), in line with pharmacological and optogenetic results in this manuscript. Computational modeling predicted that disrupting either D2-MSNs or D1-MSNs increased self-reported estimates of time, which was supported by both optogenetic and pharmacological experiments. Notably, these disruptions are distinct from increased timing variability reported with administrations of amphetamine, ventral tegmental area dopamine neuron lesions, and rodent models of neurodegenerative disease (Balci et al., 2008; Gür et al., 2020, 2019; Larson et al., 2022; Weber et al., 2023). Furthermore, our current data demonstrate that disrupting either D2-MSN or D1-MSN activity shifted MSN dynamics and degraded temporal encoding, supporting prior work (De Corte et al., 2019; Drew et al., 2007, 2003; Stutt et al., 2023). Our recording experiments do not identify where a possible response threshold T is instantiated, but downstream basal ganglia structures may have a key role in setting response thresholds (Toda et al., 2017).”

      (13) For Figure 2, what is the error region for G and H? Is there a statistically significant difference between the start (e.g., 0-1 s) and the end (e.g., 5-6 s) time? 

      G and H are standard error, which we have now clarified.

      And on Line 166: 

      “These differences resulted in distinct activity early in the interval (0-1 seconds; F = 6.0, p = 0.02 accounting for variance between mice), but not late in the interval (5-6 seconds; F = 1.9, p = 0.17 accounting for variance between mice) between D2-MSNs and D1-MSNs.”

      Minor: 

      (1)  Figure 2 legend showed the wrong label "Peri-event raster C) from a D2-MSN (red) and E) from a D1-MSN (blue). It should be (D). 

      Fixed, thank you.  

      (2)  Figure 2. Missing legend for (E) and (F).  

      Fixed, thank you.  

      (3)  Line 423: mistyped "\" 

      Fixed, thank you.  

      Reviewer #3 (Recommendations For The Authors): 

      -  To clarify that complementary means opposing in this context, I suggest changing the title. 

      This is a helpful suggestion. We have changed it exactly as the reviewer suggested: 

      “Complementary opposing D2-MSNs and D1-MSNs dynamics during interval timing”

      -  I recommend adding a supplementary figure to demonstrate all the nose pokes in all trials in a given session. The current figures make it hard to assess the specifics of the behavior. For example, what happens if, in a long-interval trial, the mouse pokes in the second nose poke before 6 seconds? Is that behavior punished? Do they keep alternating between the nose poke or do they stick to one nose poke? 

      We agree. We think this is a main point, and we have now redesigned Figure 1 to describe these details: 

      And added these details to the methods (Line 548): 

      “Interval timing switch task. We used a mouse-optimized operant interval timing task described in detail previously (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). Briefly, mice were trained in sound-attenuating operant chambers, with two front nosepokes flanking either side of a food hopper on the front wall, and a third nosepoke located at the center of the back wall. The chamber was positioned below an 8-kHz, 72-dB speaker (Fig 1A; MedAssociates, St. Albans, VT). Mice were 85% food restricted and motivated with 20 mg sucrose pellets (BioServ, Flemington, NJ). Mice were initially trained to receive rewards during fixed ratio nosepoke response trials. Nosepoke entry and exit were captured by infrared beams. After shaping, mice were trained in the “switch” interval timing task. Mice self-initiated trials at the back nosepoke, after which tone and nosepoke lights were illuminated simultaneously. Cues were identical on all trial types and lasted the entire duration of the trial (6 or 18 seconds). On 50% of trials, mice were rewarded for a nosepoke after 6 seconds at the designated first ‘front’ nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking first at the ‘first’ nosepoke location and then switching to the ‘second’ nosepoke location; the reward was delivered for initial nosepokes at the second nosepoke location after 18 seconds when preceded by a nosepoke at the first nosepoke location.  Multiple nosepokes at each nosepokes were allowed. Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).”

      -  Figures 2E and 2F suggest that some D1 cells ramp up during the first 6 seconds, while others ramp down. The same is more or less true for D2s. I wonder if the analysis will lose its significance if the two outlier D1s are excluded from Figure 3D. 

      This is a great idea suggested by multiple reviewers. We repeated this analysis with outliers removed. We used a data-driven approach to remove outliers (Line 656): 

      “We performed additional sensitivity analysis excluding outliers outside of 95% confidence intervals and measuring firing rate from the start of the interval to the time of the switch response on a trial-by-trial level for each neuron.”

      And described these data in the results (Line 201): 

      “We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      Finally, we removed the outliers the reviewers alluded to – two D1 MSNs – and found similar results (F=6.59, p=0.01 for main effect of D2 vs. D1 MSNs controlling for between-mouse variability). We elected to include the more data driven approach based on 95% confidence intervals.

    1. Author response:

      Reviewer #1:

      This review evaluates the SCellBOW framework, which applies phenotype algebra to obtain vectors from cancer subclusters or user-defined subclusters.

      Strengths:

      SCellBOW employs an innovative application of NLP-inspired techniques to analyze scRNA-seq data, facilitating the identification and visualization of phenotypically divergent cell subpopulations. The framework demonstrates robustness in accurately representing various cell types across multiple datasets, highlighting its versatility and utility in different biological contexts. By simulating the impact of specific malignant subpopulations on disease prognosis, SCellBOW provides valuable insights into the relative risk and aggressiveness of cancer subpopulations, which is crucial for personalized therapeutic strategies. The identification of a previously unknown and aggressive AR−/NElow subpopulation in metastatic prostate cancer underscores the potential of SCellBOW in uncovering clinically significant findings.

      Major concerns:

      The reliance on bulk RNA-seq data as a reference raises concerns about potentially misleading results due to the presence of RNA expression from immune cells in the TME. It is unclear if SCellBOW adequately addresses this issue, which could affect the accuracy of the cancer subcluster vectors.

      To address the concern about potentially misleading results due to the TME when using bulk RNA-seq data as a reference:

      a. We account for systematic biases between the single-cell and bulk transcriptomics readouts by creating pseudo-bulk profiles for single-cell clusters, enabling more accurate comparisons.

      b. We encode expressions into word vectors and co-embed them together. By doing this, we mitigate any possibility of systematic differences in the embedding.

      c. It is imperative that we subject both single-cell and bulk data through the same treatments because otherwise, it will be difficult to perform algebraic operations on them.

      d. We rely on tumor bulk transcriptomics data from TCGA due to its high sample size and patient meta-data such as information pertaining to patient survival.

      We will discuss this in the revised manuscript.

      The method of extracting vectors in phenotype algebra appears to be a straightforward subtraction operation. This simplicity might limit its efficiency in excluding associations with phenotypes from specific subpopulations, potentially leading to inaccurate interpretations of the data.

      Vector algebra operations are not done in the gene expression space (i.e., gene expression vectors associated with tumor samples), rather we process the single cell and bulk expression profiles through multiple steps (pseudo-bulk vector generation for single cell clusters, mapping gene expression values to word frequencies as better understood by the Doc2vec neural networks etc.) to ensure their embeddings are consistent and capture intricate phenotypic information. We have demonstrated this through rigorous validation of the clusters yielded on various types of healthy and diseased samples. Furthermore, we have demonstrated the consistency of the vector algebra operations on known cancer subtypes in breast cancer, glioblastoma, and prostate cancer.

      We will discuss this in the revised manuscript.

      The review would benefit from additional validation studies to assess the effectiveness of SCellBOW in distinguishing between cancerous and non-cancerous signals, particularly in heterogeneous tumor environments.

      In our study, we are primarily interested in signals from malignant cells. However, we may consider scRNA-seq data with stromal cells and test whether SCellBOW can identify the influence of different stromal cell types on cancer aggressiveness.

      Further clarification on how SCellBOW handles mixed-cell populations within bulk RNA-seq data would strengthen the evaluation of its applicability and reliability in diverse research settings.

      We will elaborate on our discussion in the Result as well as Discussion sections.

      Reviewer #2:

      The authors developed a novel tool, SCellBOW, to perform cell clustering and infer survival risks on individual cancer cell clusters from the single-cell RNA seq dataset. The key ideas/techniques used in the tool include transfer learning, bag of words (BOW), and phenotype algebra which is similar to word algebra from natural language processing (NLP). Comparisons with existing methods demonstrated that SCellBOW provides superior clustering results and exhibits robust performance across a wide range of datasets. Importantly, a distinguishing feature of SCellBOW compared to other tools is its ability to assign risk scores to specific cancer cell clusters. Using SCellBOW, the authors identified a new group of prostate cancer cells characterized by a highly aggressive and dedifferentiated phenotype.

      Strengths:

      The application of natural language processing (NLP) to single-cell RNA sequencing (scRNA-seq) datasets is both smart and insightful. Encoding gene expression levels as word frequencies is a creative way to apply text analysis techniques to biological data. When combined with transfer learning, this approach enhances our ability to describe the heterogeneity of different cells, offering a novel method for understanding the biological behavior of individual cells and surpassing the capabilities of existing cell clustering methods. Moreover, the ability of the package to predict risk, particularly within cancer datasets, significantly expands the potential applications.

      Major concerns:

      Given the promising nature of this tool, it would be beneficial for the authors to test the risk-stratification functionality on other types of tumors with high heterogeneity, such as liver and pancreatic cancers, which currently lack clinically relevant and well-recognized stratification methods. Additionally, it would be worthwhile to investigate how the tool could be applied to spatial transcriptomics by analyzing cell embeddings from different layers within these tissue

      (1) Our selection of glioblastoma and breast cancer for this study was primarily driven by the focus on extensively studied and well-defined cancer types. To demonstrate the effectiveness of our model, we tested it on advanced prostate cancer, which currently lacks clinically relevant and well-recognized stratification methods. This application to metastatic prostate cancer serves as a proof of concept, illustrating our model's potential to provide valuable insights into cancer types where established stratification approaches are limited or absent. However, as suggested by the Reviewer, we will try to incorporate results for liver cancer, subject to the availability of adequate data for model building.

      (2) Regarding the application of our tool to spatial transcriptomics, we have already analyzed data from Digital Spatial Profiling (DSP). The article is already quite complex and involved, and we are afraid the inclusion of spatial transcriptomics may amount to a significant extension of the method. To this end, although we will discuss the future possibilities, we will skip the method validity check on spatial transcriptomics data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      (1) The reviewers asked to clarify the BTH assay: The fused T25 and T18 domains must be in the cytoplasmic to complement successfully. The authors stated that the N terminus of Aeg1 transverses the membrane once, which means that the T25-Aeg1 will have T25 in the periplasm. However, T18C vector fusion with other division proteins will have T18C of ZipA in the periplasm (ZipA's N terminus is on the periplasmic side of the inner membrane) while that of FtsN in the cytoplasm (FtsN's N terminus is in the cytoplasm). As such, it isn't easy to understand why T25-Aeg1 showed positive results for both ZipA and FtsN. Note that FtsL, FtsB, and FtsI all have the same topology as FtsN but showed negative results. It is possible that these fusion proteins do not fold correctly, and hence, the results cannot be interpreted directly. The authors did not address this concern but only cited that BTH is a commonly used assay for protein-protein interactions.

      In response to the editor's comments and the concerns raised by the reviewer, we have performed two sets of Aeg1-T25 fusion experiments to determine whether the Aeg1 topology impacts protein interactions measured by bacterial two-hybrid (BTH) assays. In the first set of experiments, we fused the T25 domain to the N-terminus of Aeg1 and still observed strong binding of Aeg1 to ZipA and FtsN, respectively. Similar results were obtained from the second set of experiments in which the T25 domain was fused to the C-terminus of Aeg1.

      These results indicate that the precise topology of Aeg1 does not significantly impact its ability to engage these binding partners. Aeg1 is predicted to harbor a single transmembrane domain, however, the precise location of this transmembrane segment differs in predictions made by different algorithms. The SMART Web site (1) predicted the transmembrane region to be located at the N-terminus of Aeg1 (7-29 aa). In contrast, Phobius, based on HMM (2, 3)suggested the transmembrane segment is situated more centrally within the Aeg1 protein (134-151 aa), and further proposed that the N-terminus may function as a signal peptide. This latter prediction also provides a potential explanation for the larger-than-expected molecular weight of the Aeg1 truncation mutant observed in the Western blot shown in Fig 1C. The removal of the putative signal peptide may have altered the protein structure, affecting its electrophoretic mobility. As a result, we are more inclined to favor the topology model for Aeg1 predicted by Phobius.

      (2) It is still difficult to identify the midcell localization patterns of Aeg1 and other division proteins from microscopy images (Fig. 4C and Fig. 5A). In Fig 4C, only ZipA and Aeg1 formed clear, regular band-like colocalization patterns. Others formed irregular co-localized puncta along the cell length, different from the expected midcell localization patterns. Cells also appeared to be much longer than WT cells, suggesting cell division defects. The most likely reason for these aberrant localization patterns and filamentous cells is that GFP/mCherry-fusions of these division proteins are not functional and become dominant negative, interfering with proper cell division. The authors need to test the functionality of these fusion proteins before they can be used for imaging. (The authors also mislabeled Hoechst and the division protein GFP panels labels in this figure.)

      Thank you for raising this important point. To examine the functionality of the fluorescence protein fusion constructs, we have painstakingly performed conditional knockout of the genes of interest (zipA, ftsB, ftsL, and ftsN) in A. baumannii strains inducibly expressing the corresponding fusion protein. We found that these fluorescence protein fusions were able to fully rescue the growth of the mutant lacking the corresponding fts gene (Figure 4-figure supplement 1). Concurrently, we have also successfully knocked out the aeg1 gene under conditions in trans expression of an mCherry-Aeg1 fusion protein, which was able to effectively rescue the growth defects of the Δa_eg1_ mutant (Figure 4-figure supplement 1). We then introduced the functional fluorescence protein fusions into wild-type cells and observed the co-localization of Aeg1 with the relevant Fts proteins. The results showed that Aeg1 indeed co-localized with ZipA, FtsB, FtsL, and FtsN (Fig.4E, red arrows), but occasional non-co-localization was also observed (Fig.4E, white arrows).

      We have utilized the functional fluorescence protein fusion constructs to analyze the localization of relevant Aeg1-interacting proteins in the Δ_aeg1_ strain upon Aeg1 depletion. Our results showed that the depletion of Aeg1 indeed impacted the midcell localization of the several Aeg1-interacting Fts proteins.

      References

      (1) Letunic I, Khedkar S, Bork P. SMART: recent updates, new developments and status in 2020. Nucleic acids research. 2021;49:D458-d60.doi: 10.1093/nar/gkaa937.

      (2) Käll L, Krogh A, Sonnhammer EL. A combined transmembrane topology and signal peptide prediction method. Journal of molecular biology. 2004;338:1027-36.doi: 10.1016/j.jmb.2004.03.016

      (3) Käll L, Krogh A, Sonnhammer EL. Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic acids research. 2007;35:W429-32.doi: 10.1093/nar/gkm256

    1. Author response:

      Reviewer #1:

      (1) Clarification of Novelty and Contribution:

      - We agree that the novelty of our study could have been better articulated. We will more clearly define the specific gaps in knowledge our study addresses. We will also clarify the novelty in our analysis of the correlational structure of gene expression under stress.

      (2) Methodological Details:

      - We acknowledge the need for additional detail in the methods section regarding the estimation of G, E, and GxE effects. We will expand this section to include the software used (R), the specific ANOVA models applied, and how significance was determined. We will also clarify which effects were treated as fixed or random effects.

      (3) Terminology Consistency:

      - We will thoroughly review the manuscript to ensure consistent use of selection-related terminology. This will involve distinguishing between quantitative genetics terms (e.g., irectional, stabilizing) and molecular evolution terms (e.g., positive, purifying) to avoid any confusion.

      (4) Bias in Conditional Neutrality and Antagonistic Pleiotropy:

      - We appreciate the suggestion to clarify the discussion around conditional neutrality (CN) and antagonistic pleiotropy (AP). We will elaborate on the inherent bias in detecting CN and P and specify how we adjusted P-value thresholds. Additionally, we will try to refine the discussion to address the concerns raised about the comparison of gene expression and local adaptation, incorporating relevant literature.

      Reviewer #2:

      (1) Sensitivity of Fitness Proxy:

      - We acknowledge the limitations of using the total filled grain number as a fitness proxy. We will include a discussion on the potential sensitivity of our results to this choice.

      (2) Cis- and trans-eQTL Contributions:

      - We appreciate the suggestion to report effect sizes in addition to the frequency of cis- and trans-eQTLs. We will incorporate this into our analysis and discuss whether our conclusions regarding the predominance of trans-eQTLs in expression variation hold when considering effect sizes.

      (3) Cis-Trans Relationship Analysis:

      - Since we wanted to estimate compensating vs. reinforcing effects, this essentially entails identifying genes that have opposing directionality of cis and trans-effects. To get the total trans-effect we decided to take the mean effect of trans-eQTLs. This mean was only used to identify the compensating/reinforcing genes and although the mean effects diminishes the effect of small trans-eQTLs, this metric was not used in downstream analyses.

      Reviewer #3:

      (1) Integration of Analyses:

      - We acknowledge that the manuscript currently presents some analyses in a somewhat independent manner. Although it would be ideal to have a central hypothesis/message, our study is meant to broadly outline the various responses and fitness effects of salinity stress on rice. Throughout the manuscript, we have also included comparisons between our findings and that of our previous studies on drought stress to highlight any consistent themes or novel insights.

      (2) X-by-Environment Effects:

      - We do plan to consider fitting models that explicitly incorporate X-by-environment interactions to provide a more detailed understanding of the genetics of plasticity between the two environments, but it is beyond the scope of this paper. This will be explored in a separate report.

      (3) Gene Grouping Methods:<br /> - We will try to discuss the pros and cons of using PCA versus gene co-expression network analysis (e.g., WGCNA) for grouping genes. We will also explore applying WGCNA in our analysis to see if it offers any additional insights or clarity.

      Reviewer #4:

      (1) Selection Analysis Across Environments:

      - We do plan to consider fitting models that explicitly incorporate G×E interactions to provide a more detailed understanding of the genetics of plasticity between the two environments, but it is beyond the scope of this paper. This will be explored in a separate report.

      (2) Gene Expression Trade-Offs Terminology:

      - We will revise our terminology to better reflect the nature of the trade-offs observed, and explore variation in covariance between phenotype and fitness between the two environments.

      (3) Biological Processes and Decoherence:

      - We will explore applying WGCNA in our analysis to see if it offers any additional insights or clarity.

      (4) Underutilization of Organismal Traits:

      - We did perform GWAS for all the traits measured in both environments, but did not find any significant hits. We will examine whether selection of co-expression modules are correlated with the traits, and may incorporate it in our manuscript depending on the results.

      (5) Detailed eQTL Analysis:

      - We will expand our eQTL analysis to include detailed statistics at the molecular trait level, including the phenotypic variance explained by cis- and trans-eQTLs and how these vary by environment.

      Although we focus on salinity conditions in our cis-trans compensation analysis in the main results, we have provided comparisons for all our eQTL analyses between normal and salinity conditions in the main text (with figures as supplementary).<br /> We are confident that these revisions will significantly strengthen our manuscript and address the concerns raised by the reviewers. We look forward to submitting a revised version that better communicates the significance and robustness of our findings.<br /> Thank you again for your valuable feedback.

    1. Author response:

      eLife assessment

      The authors present a potentially useful approach of broad interest arguing that anterior cingulate cortex (ACC) tracks option values in decisions involving delayed rewards. The authors introduce the idea of a resource-based cognitive effort signal in ACC ensembles and link ACC theta oscillations to a resistance-based strategy. The evidence supporting these new ideas is incomplete and would benefit from additional detail and more rigorous analyses and computational methods.

      The reviewers have provided several excellent suggestions and pointed out important shortcomings of our manuscript. We are grateful for their efforts. To address these concerns, we are planning a major revision to the manuscript. In the revision, our goal is to address each of the reviewer’s concerns and codify the evidence for resistance- and resource-based control signals in the rat anterior cingulate cortex. We have provided a nonexhaustive list we plan to address in the point by point responses below.   

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Young (2.5 mo [adolescent]) rats were tasked to either press one lever for immediate reward or another for delayed reward.

      Please note that at the time of testing and training that the rats were > 4 months old.

      The task had a complex structure in which (1) the number of pellets provided on the immediate reward lever changed as a function of the decisions made, (2) rats were prevented from pressing the same lever three times in a row. Importantly, this task is very different from most intertemporal choice tasks which adjust delay (to the delayed lever), whereas this task held the delay constant and adjusted the number of 20 mg sucrose pellets provided on the immediate value lever.

      Several studies parametrically vary the immediate lever (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183). While most versions of the task will yield qualitatively similar estimates of discounting, the adjusting amount is preferred as it provides the most consistent estimates (PMID: 22445576). More specifically this version of the task avoids contrast effects of that result from changing the delay during the session (PMID: 23963529, 24780379, 19730365, 35661751) which complicates value estimates.

      Analyses are based on separating sessions into groups, but group membership includes arbitrary requirements and many sessions have been dropped from the analyses.

      We are in discussions about how to address this valid concern. This includes simply splitting the data by delay. This approach, however, has conceptual problems that we will also lay out in a full revision.  

      Computational modeling is based on an overly simple reinforcement learning model, as evidenced by fit parameters pegging to the extremes.

      We apologize for not doing a better job of explaining the advantages of this type of model for the present purposes. Nevertheless, given the clear lack of enthusiasm, we felt it was better to simply update the model as suggested by the Reviewers. The straightforward modifications have now been implemented and we are currently in discussion about how the new results fit into the larger narrative.

      The neural analysis is overly complex and does not contain the necessary statistics to assess the validity of their claims.

      We plan to streamline the existing analysis and add statistics, where required, to address this concern.

      Strengths:

      The task is interesting.

      Thank you for the positive comment

      Weaknesses:

      Behavior:

      The basic behavioral results from this task are not presented. For example, "each recording session consisted of 40 choice trials or 45 minutes". What was the distribution of choices over sessions? Did that change between rats? Did that change between delays? Were there any sequence effects? (I recommend looking at reaction times.) Were there any effects of pressing a lever twice vs after a forced trial?

      Animals tend to make more immediate choices as the delay is extended, which is reflected in Figure 1. We will add more detail and additional statistics to address these questions. 

      This task has a very complicated sequential structure that I think I would be hard pressed to follow if I were performing this task.

      Human tasks implement a similar task structure (PMID: 26779747). Please note the response above that outlines the benefits of using of this task.   

      Before diving into the complex analyses assuming reinforcement learning paradigms or cognitive control, I would have liked to have understood the basic behaviors the rats were taking. For example, what was the typical rate of lever pressing? If the rats are pressing 40 times in 45 minutes, does waiting 8s make a large difference?

      This is a good suggestion. However, rats do not like waiting for rewards, even small delays. Going from the 4 à 8 sec delay results in more immediate choices, indicating that the rats will forgo waiting for a smaller reinforcer at the 8 sec delay as compared to the 4 sec.  

      For that matter, the reaction time from lever appearance to lever pressing would be very interesting (and important). Are they making a choice as soon as the levers appear? Are they leaning towards the delay side, but then give in and choose the immediate lever? What are the reaction time hazard distributions?

      These are excellent suggestions. We are looking into implementing them.

      It is not clear that the animals on this task were actually using cognitive control strategies on this task. One cannot assume from the task that cognitive control is key. The authors only consider a very limited number of potential behaviors (an overly simple RL model). On this task, there are a lot of potential behavioral strategies: "win-stay/lose-shift", "perseveration", "alternation", even "random choices" should be considered.

      The strategies the Reviewer mentioned are descriptors of the actual choices the rats made. For example, perseveration means the rat is choosing one of the levers at an excessively high rate whereas alternation means it is choosing the two levers more or less equally, independent of payouts. But the question we are interested in is why? We are arguing that the type of cognitive control determines the choice behavior but cognitive control is an internal variable that guides behavior, rather than simply a descriptor of the behavior. For example, the animal opts to perseverate on the delayed lever because the cognitive control required to track ival is too high. We then searched the neural data for signatures of the two types of cognitive control.

      The delay lever was assigned to the "non-preferred side". How did side bias affect the decisions made?

      The side bias clearly does not impact performance as the animals prefer the delay lever at shorter delays, which works against this bias.

      The analyses based on "group" are unjustified. The authors compare the proportion of delayed to immediate lever press choices on the non-forced trials and then did k-means clustering on this distribution. But the distribution itself was not shown, so it is unclear whether the "groups" were actually different. They used k=3, but do not describe how this arbitrary number was chosen. (Is 3 the optimal number of clusters to describe this distribution?) Moreover, they removed three group 1 sessions with an 8s delay and two group 2 sessions with a 4s delay, making all the group 1 sessions 4s delay sessions and all group 2 sessions 8s delay sessions. They then ignore group 3 completely. These analyses seem arbitrary and unnecessarily complex. I think they need to analyze the data by delay. (How do rats handle 4s delay sessions? How do rats handle 6s delay sessions? How do rats handle 8s delay sessions?). If they decide to analyze the data by strategy, then they should identify specific strategies, model those strategies, and do model comparison to identify the best explanatory strategy. Importantly, the groups were session-based, not rat based, suggesting that rats used different strategies based on the delay to the delayed lever.

      These are excellent points and, as stated above, we are in the process revisiting the group assignments in an effort allay these criticisms.

      The reinforcement learning model used was overly simple. In particular, the RL model assumes that the subjects understand the task structure, but we know that even humans have trouble following complex task structures. Moreover, we know that rodent decision-making depends on much more complex strategies (model-based decisions, multi-state decisions, rate-based decisions, etc). There are lots of other ways to encode these decision variables, such as softmax with an inverse temperature rather than epsilon-greedy. The RL model was stated as a given and not justified. As one critical example, the RL model fit to the data assumed a constant exponential discounting function, but it is well-established that all animals, including rodents, use hyperbolic discounting in intertemporal choice tasks. Presumably this changes dramatically the effect of 4s and 8s. As evidence that the RL model is incomplete, the parameters found for the two groups were extreme. (Alpha=1 implies no history and only reacting to the most recent event. Epsilon=0.4 in an epsilon-greedy algorithm is a 40% chance of responding randomly.)

      Please see our response above. We agree that the approach was not justified, but we do not agree that it is invalid. Simply stated, a softmax approach gives the best fit to the choice behavior, whereas our epsilon-greedy approach attempted to reproduce the choice behavior using a naïve agent that progressively learns the values of the two levers on a choice-by-choice basis. The epsilon-greedy approach can therefore tell us whether it is possible to reproduce the choice behavior by an agent that is only tracking ival. Given our discovery of an ival-tracking signal in ACC, we believed that this was a critical point (although admittedly we did a poor job of communicating it). However, we also appreciate that important insights can be gained by fitting a model to the data as suggested. In fact, we had implemented this approach initially and are currently reconsidering what it can tell us in light of the Reviewers comments.

      The authors do add a "dbias" (which is a preference for the delayed lever) term to the RL model, but note that it has to be maximal in the 4s condition to reproduce group 2 behavior, which means they are not doing reinforcement learning anymore, just choosing the delayed lever.

      Exactly. The model results indicated that a naïve agent that relied only on ival tracking would not behave in this manner. Hence it therefore was unlikely that the G1 animals were using an ival-tracking strategy, even though a strong ival-tracking signal was present in ACC.

      Neurophysiology:

      The neurophysiology figures are unclear and mostly uninterpretable; they do not show variability, statistics or conclusive results.

      While the reviewer is justified in criticizing the clarity of the figures, the statement that “they do not show variability, statistics or conclusive results” is demonstrably false. Each of the figures presented in the manuscript, except Figure 3, are accompanied by statistics and measures of variability. This comment is hyperbolic and not justified.  

      Figure 3 was an attempt to show raw neural data to better demonstrate how robust the ivalue tracking signal is.

      As with the behavior, I would have liked to have seen more traditional neurophysiological analyses first. What do the cells respond to? How do the manifolds change aligned to the lever presses? Are those different between lever presses?

      We provide several figures describing how neurons change firing rates in response to varying reward. We are unsure what the reviewer means by “traditional analysis”, especially since this is immediately followed by a request for an assessment of neural manifolds. That said, we are developing ways to make the analysis more intuitive and, hopefully, more “traditional”.

      Are there changes in cellular information (both at the individual and ensemble level) over time in the session?

      We provide several analyses of how firing rate changes over trials in relation to ival over time in the session.

      How do cellular responses differ during that delay while both levers are out, but the rats are not choosing the immediate lever?

      It is not clear to us how this analysis addresses our hypothesis regarding control signals in ACC.

      Figure 3, for example, claims that some of the principal components tracked the number of pellets on the immediate lever ("ival"), but they are just two curves. No statistics, controls, or justification for this is shown. BTW, on Figure 3, what is the event at 200s?

      Figure 3 will be folded into one of the other figures that contains the summary statistics.

      I'm confused. On Figure 4, the number of trials seems to go up to 50, but in the methods, they say that rats received 40 trials or 45 minutes of experience.

      This analysis included force trials. The max of the session is 40 choice trials. We will clarify in the revised manuscript. 

      At the end of page 14, the authors state that the strength of the correlation did not differ by group and that this was "predicted" by the RL modeling, but this statement is nonsensical, given that the RL modeling did not fit the data well, depended on extreme values. Moreover, this claim is dependent on "not statistically detectable", which is, of course, not interpretable as "not different".

      We plan to revisit this analysis and the RL model.

      There is an interesting result on page 16 that the increases in theta power were observed before a delayed lever press but not an immediate lever press, and then that the theta power declined after an immediate lever press.

      Thank you for the positive comment.

      These data are separated by session group (again group 1 is a subset of the 4s sessions, group 2 is a subset of the 8s sessions, and group 3 is ignored). I would much rather see these data analyzed by delay itself or by some sort of strategy fit across delays.

      Provisional analysis indicates that the results hold up over delays, rather than the groupings in the paper. We will address this in a full revision of the manuscript.

      That being said, I don't see how this description shows up in Figure 6. What does Figure 6 look like if you just separate the sessions by delay?

      We are unclear what the reviewer means by “this description”.

      Discussion:

      Finally, it is unclear to what extent this task actually gets at the questions originally laid out in the goals and returned to in the discussion. The idea of cognitive effort is interesting, but there is no data presented that this task is cognitive at all. The idea of a resourced cognitive effort and a resistance cognitive effort is interesting, but presumably the way one overcomes resistance is through resource-limited components, so it is unclear that these two cognitive effort strategies are different.

      We view the strong evidence for ival tracking presented herein as a potentially critical component of resource based cognitive effort. We hope to clarify how this task engaged cognitive effort more clearly.  

      The authors state that "ival-tracking" (neurons and ensembles that presumably track the number of pellets being delivered on the immediate lever - a fancy name for "expectations") "taps into a resourced-based form of cognitive effort", but no evidence is actually provided that keeping track of the expectation of reward on the immediate lever depends on attention or mnemonic resources. They also state that a "dLP-biased strategy" (waiting out the delay) is a "resistance-based form of cognitive effort" but no evidence is made that going to the delayed side takes effort.

      There is a well-developed literature that rats and mice do not like waiting for delayed reinforcers. We contend that enduring something you don’t like takes effort.

      The authors talk about theta synchrony, but never actually measure theta synchrony, particularly across structures such as amygdala or ventral hippocampus. The authors try to connect this to "the unpleasantness of the delay", but provide no measures of pleasantness or unpleasantness. They have no evidence that waiting out an 8s delay is unpleasant.

      We will better clarify how our measure of Theta power relates to synchrony. There is a well-developed literature that rats and mice do not like waiting for delayed reinforcers.

      The authors hypothesize that the "ival-tracking signal" (the expectation of number of pellets on the immediate lever) "could simply reflect the emotional or autonomic response". Aside from the fact that no evidence for this is provided, if this were to be true, then, in what sense would any of these signals be related to cognitive control?

      This is proposed as an alternative explanation to the ivalue signal. We provide this as a possibility, never a conclusion. We will clarify this in the revised text. 

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores the neuronal signals that underlie resistance vs resource-based models of cognitive effort. The authors use a delayed discounting task and computational models to explore these ideas. The authors find that the ACC strongly tracks value and time, which is consistent with prior work. Novel contributions include quantification of a resource-based control signal among ACC ensembles, and linking ACC theta oscillations to a resistance-based strategy.

      Strengths:

      The experiments and analyses are well done and have the potential to generate an elegant explanatory framework for ACC neuronal activity. The inclusion of local-field potential / spike-field analyses is particularly important because these can be measured in humans.

      Thank you for the endorsement of our work.

      Weaknesses:

      I had questions that might help me understand the task and details of neuronal analyses.

      (1) The abstract, discussion, and introduction set up an opposition between resource and resistance based forms of cognitive effort. It's clear that the authors find evidence for each (ACC ensembles = resource, theta=resistance?) but I'm not sure where the data fall on this dichotomy.

      a. An overall very simple schematic early in the paper (prior to the MCML model? or even the behavior) may help illustrate the main point.

      b. In the intro, results, and discussion, it may help to relate each point to this dichotomy.

      c. What would resource-based signals look like? What would resistance based signals look like? Is the main point that resistance-based strategies dominate when delays are short, but resource-based strategies dominate when delays are long?

      d. I wonder if these strategies can be illustrated? Could these two measures (dLP vs ival tracking) be plotted on separate axes or extremes, and behavior, neuronal data, LFP, and spectral relationships be shown on these axes? I think Figure 2 is working towards this. Could these be shown for each delay length? This way, as the evidence from behavior, model, single neurons, ensembles, and theta is presented, it can be related to this framework, and the reader can organize the findings.

      These are excellent suggestions, and we intend to implement each of them, where possible.

      (2) The task is not clear to me.

      a. I wonder if a task schematic and a flow chart of training would help readers.

      Yes, excellent idea, we intend to include this.

      b. This task appears to be relatively new. Has it been used before in rats (Oberlin and Grahame is a mouse study)? Some history / context might help orient readers.

      Indeed, this task has been used in rats in several prior studies in rats. Please see the following references (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183).

      c. How many total sessions were completed with ascending delays? Was there criteria for surgeries? How many total recording sessions per animal (of the 54?)

      Please note that the delay does not change within a session. There was no criteria for surgery. In addition, we will update Table 1 to make the number of recording sessions more clear.

      d. How many trials completed per session (40 trials OR 45 minutes)? Where are there errors? These details are important for interpreting Figure 1.

      Every animal in this data set completed 40 trials. We will update the task description to clarify this issue. There are no errors in this task, but rather the task is designed to the tendency to make an impulsive choice (smaller reward now). We will provide clarity to this issue in the revision of the manuscript.   

      (3) Figure 1 is unclear to me.

      a. Delayed vs immediate lever presses are being plotted - but I am not sure what is red, and what is blue. I might suggest plotting each animal.

      We will clarify the colors and look into schemes to graph the data set.

      b. How many animals and sessions go into each data point?

      This information is in Table 1, but this could be clearer, and we will update the manuscript.

      c. Table 1 (which might be better referenced in the paper) refers to rats by session. Is it true that some rats (2 and 8) were not analyzed for the bulk of the paper? Some rats appear to switch strategies, and some stay in one strategy. How many neurons come from each rat?

      Table 1 is accurate, and we can add the number of neurons from each animal.

      d. Task basics - RT, choice, accuracy, video stills - might help readers understand what is going into these plots

      e. Does the animal move differently (i.e., RTs) in G1 vs. G2?

      We will look into ways to incorporate this information.

      (4) I wasn't sure how clustered G1 vs. G2 vs G3 are. To make this argument, the raw data (or some axis of it) might help.

      a. This is particularly important because G3 appears to be a mix of G1 and G2, although upon inspection, I'm not sure how different they really are

      b. Was there some objective clustering criteria that defined the clusters?

      c. Why discuss G3 at all? Can these sessions be removed from analysis?

      These are all excellent suggestions and points. We plan to revisit the strategy to assign sessions to groups, which we hope will address each of these points.

      (5) The same applies to neuronal analyses in Fig 3 and 4

      a. What does a single neuron peri-event raster look like? I would include several of these.

      b. What does PC1, 2 and 3 look like for G1, G2, and G3?

      c. Certain PCs are selected, but I'm not sure how they were selected - was there a criteria used? How was the correlation between PCA and ival selected? What about PCs that don't correlate with ival?

      d. If the authors are using PCA, then scree plots and PETHs might be useful, as well as comparisons to PCs from time-shuffled / randomized data.

      We will make several updates to enhance clarity of the neural data analysis, including adding more representative examples. We feel the need to balance the inclusion of representative examples with groups stats given the concerns raised by R1.

      (6) I had questions about the spectral analysis

      a. Theta has many definitions - why did the authors use 6-12 Hz? Does it come from the hippocampal literature, and is this the best definition of theta?. What about other bands (delta - 1-4 Hz), theta (4-7 Hz); and beta - 13- 30 Hz? These bands are of particular importance because they have been associated with errors, dopamine, and are abnormal in schizophrenia and Parkinson's disease.

      This designation comes mainly from the hippocampal and ACC literature in rodents. In addition, this range best captured the peak in the power spectrum in our data. Note that we focus our analysis on theta give the literature regarding theta in the ACC as a correlate of cognitive controls (references in manuscript). We did interrogate other bands as a sanity check and the results were mostly limited to theta. Given the scope of our manuscript and the concerns raised regarding complexity we are concerned that adding frequency analyses beyond theta obfuscates the take home message. However, we think this is worthy, and we will determine if this can be done in a brief, clear, and effective manner.

      b. Power spectra and time-frequency analyses may justify the authors focus. I would show these (y-axis - frequency, x-axis - time, z-axis, power).

      This is an excellent suggestion that we look forward to incorporating. 

      (7) PC3 as an autocorrelation doesn't seem the to be right way to infer theta entrainment or spike-field relationships, as PCA can be vulnerable to phantom oscillations, and coherence can be transient. It is also difficult to compare to traditional measures of phase-locking. Why not simply use spike-field coherence? This is particularly important with reference to the human literature, which the authors invoke.

      Excellent suggestion. We will look into the phantom oscillation issue. Note that PCA provided a way to classify neurons that exhibited peaks in the autocorrelation at theta frequencies. While spike-field coherence is a rigorous tool, it addresses a slightly different question (LFP entrainment). Notwithstanding, we plan to address this issue.  

      Reviewer #3 (Public Review):

      Summary:

      The study investigated decision making in rats choosing between small immediate rewards and larger delayed rewards, in a task design where the size of the immediate rewards decreased when this option was chosen and increased when it was not chosen. The authors conceptualise this task as involving two different types of cognitive effort; 'resistance-based' effort putatively needed to resist the smaller immediate reward, and 'resource-based' effort needed to track the changing value of the immediate reward option. They argue based on analyses of the behaviour, and computational modelling, that rats use different strategies in different sessions, with one strategy in which they consistently choose the delayed reward option irrespective of the current immediate reward size, and another strategy in which they preferentially choose the immediate reward option when the immediate reward size is large, and the delayed reward option when the immediate reward size is small. The authors recorded neural activity in anterior cingulate cortex (ACC) and argue that ACC neurons track the value of the immediate reward option irrespective of the strategy the rats are using. They further argue that the strategy the rats are using modulates their estimated value of the immediate reward option, and that oscillatory activity in the 6-12Hz theta band occurs when subjects use the 'resistance-based' strategy of choosing the delayed option irrespective of the current value of the immediate reward option. If solid, these findings will be of interest to researchers working on cognitive control and ACCs involvement in decision making. However, there are some issues with the experiment design, reporting, modelling and analysis which currently preclude high confidence in the validity of the conclusions.

      Strengths:

      The behavioural task used is interesting and the recording methods should enable the collection of good quality single unit and LFP electrophysiology data. The authors recorded from a sizable sample of subjects for this type of study. The approach of splitting the data into sessions where subjects used different strategies and then examining the neural correlates of each is in principle interesting, though I have some reservations about the strength of evidence for the existence of multiple strategies.

      Thank you for the positive comments.

      Weaknesses:

      The dataset is very unbalanced in terms of both the number of sessions contributed by each subject, and their distribution across the different putative behavioural strategies (see table 1), with some subjects contributing 9 or 10 sessions and others only one session, and it is not clear from the text why this is the case. Further, only 3 subjects contribute any sessions to one of the behavioural strategies, while 7 contribute data to the other such that apparent differences in brain activity between the two strategies could in fact reflect differences between subjects, which could arise due to e.g. differences in electrode placement. To firm up the conclusion that neural activity is different in sessions where different strategies are thought to be employed, it would be important to account for potential cross-subject variation in the data. The current statistical methods don't do this as they all assume fixed effects (e.g. using trials or neurons as the experimental unit and ignoring which subject the neuron/trial came from).

      This is an important issue that we plan to address with additional analysis in the manuscript update.

      It is not obvious that the differences in behaviour between the sessions characterised as using the 'G1' and 'G2' strategies actually imply the use of different strategies, because the behavioural task was different in these sessions, with a shorter wait (4 seconds vs 8 seconds) for the delayed reward in the G1 strategy sessions where the subjects consistently preferred the delayed reward irrespective of the current immediate reward size. Therefore the differences in behaviour could be driven by difference in the task (i.e. external world) rather than a difference in strategy (internal to the subject). It seems plausible that the higher value of the delayed reward option when the delay is shorter could account for the high probability of choosing this option irrespective of the current value of the immediate reward option, without appealing to the subjects using a different strategy.

      Further, even if the differences in behaviour do reflect different behavioural strategies, it is not obvious that these correspond to allocation of different types of cognitive effort. For example, subjects' failure to modify their choice probabilities to track the changing value of the immediate reward option might be due simply to valuing the delayed reward option higher, rather than not allocating cognitive effort to tracking immediate option value (indeed this is suggested by the neural data). Conversely, if the rats assign higher value to the delayed reward option in the G1 sessions, it is not obvious that choosing it requires overcoming 'resistance' through cognitive effort.

      The RL modelling used to characterise the subject's behavioural strategies made some unusual and arguably implausible assumptions:

      i) The goal of the agent was to maximise the value of the immediate reward option (ival), rather than the standard assumption in RL modelling that the goal is to maximise long-run (e.g. temporally discounted) reward. It is not obvious why the rats should be expected to care about maximising the value of only one of their two choice options rather than distributing their choices to try and maximise long run reward.

      ii) The modelling assumed that the subject's choice could occur in 7 different states, defined by the history of their recent choices, such that every successive choice was made in a different state from the previous choice. This is a highly unusual assumption (most modelling of 2AFC tasks assumes all choices occur in the same state), as it causes learning on one trial not to generalise to the next trial, but only to other future trials where the recent choice history is the same.

      iii) The value update was non-standard in that rather than using the trial outcome (i.e. the amount of reward obtained) as the update target, it instead appeared to use some function of the value of the immediate reward option (it was not clear to me from the methods exactly how the fival and fqmax terms in the equation are calculated) irrespective of whether the immediate reward option was actually chosen.

      iv) The model used an e-greedy decision rule such that the probability of choosing the highest value option did not depend on the magnitude of the value difference between the two options. Typically, behavioural modelling uses a softmax decision rule to capture a graded relationship between choice probability and value difference.

      v) Unlike typical RL modelling where the learned value differences drive changes in subjects' choice preferences from trial to trial, to capture sensitivity to the value of the immediately rewarding option the authors had to add in a bias term which depended directly on this value (not mediated by any trial-to-trial learning). It is not clear how the rat is supposed to know the current trial ival if not by learning over previous trials, nor what purpose the learning component of the model serves if not to track the value of the immediate reward option.

      Given the task design, a more standard modelling approach would be to treat each choice as occurring in the same state, with the (temporally discounted) value of the outcomes obtained on each trial updating the value of the chosen option, and choice probabilities driven in a graded way (e.g. softmax) by the estimated value difference between the options. It would be useful to explicitly perform model comparison (e.g. using cross-validated log-likelihood with fitted parameters) of the authors proposed model against more standard modelling approaches to test whether their assumptions are justified. It would also be useful to use logistic regression to evaluate how the history of choices and outcomes on recent trials affects the current trial choice, and compare these granular aspects of the choice data with simulated data from the model.

      Each of the issues outlined above with the RL model a very important. We are currently re-evaluating the RL modeling approach in light of these comments. Please see comments to R1 regarding the model as they are relevant for this as well.

      There were also some issues with the analyses of neural data which preclude strong confidence in their conclusions:

      Figure 4I makes the striking claim that ACC neurons track the value of the immediately rewarding option equally accurately in sessions where two putative behavioural strategies were used, despite the behaviour being insensitive to this variable in the G1 strategy sessions. The analysis quantifies the strength of correlation between a component of the activity extracted using a decoding analysis and the value of the immediate reward option. However, as far as I could see this analysis was not done in a cross-validated manner (i.e. evaluating the correlation strength on test data that was not used for either training the MCML model or selecting which component to use for the correlation). As such, the chance level correlation will certainly be greater than 0, and it is not clear whether the observed correlations are greater than expected by chance.

      This is an astute observation and we plan to address this concern. We agree that cross-validation may provide an appropriate tool here.

      An additional caveat with the claim that ACC is tracking the value of the immediate reward option is that this value likely correlates with other behavioural variables, notably the current choice and recent choice history, that may be encoded in ACC. Encoding analyses (e.g. using linear regression to predict neural activity from behavioural variables) could allow quantification of the variance in ACC activity uniquely explained by option values after controlling for possible influence of other variables such as choice history (e.g. using a coefficient of partial determination).

      This is also an excellent point that we plan to address the manuscript update.

      Figure 5 argues that there are systematic differences in how ACC neurons represent the value of the immediate option (ival) in the G1 and G2 strategy sessions. This is interesting if true, but it appears possible that the effect is an artefact of the different distribution of option values between the two session types. Specifically, due to the way that ival is updated based on the subjects' choices, in G1 sessions where the subjects are mostly choosing the delayed option, ival will on average be higher than in G2 sessions where they are choosing the immediate option more often. The relative number of high, medium and low ival trials in the G1 and G2 sessions will therefore be different, which could drive systematic differences in the regression fit in the absence of real differences in the activity-value relationship. I have created an ipython notebook illustrating this, available at: https://notebooksharing.space/view/a3c4504aebe7ad3f075aafaabaf93102f2a28f8c189ab9176d4807cf1565f4e3. To verify that this is not driving the effect it would be important to balance the number of trials at each ival level across sessions (e.g. by subsampling trials) before running the regression.

      Excellent point and thank you for the notebook. We explored a similar approach previously but did not pursue it to completion. We will re-investigate this issue.

    1. Author response:

      Reviewer #3 (Public Review):

      (1) Conditions on growth and interaction rates for feasibility and stability. The authors approach this using a mean field approximation, and it is important to note that there is no particular temperature dependence assumed here: as far as it goes, this analysis is completely general for arbitrary Lotka-Volterra interactions.

      However, the starting point for the authors' mean field analysis is the statement that "it is not possible to meaningfully link the structure of species interactions to the exact closed-form analytical solution for [equilibria] 𝑥^*_𝑖 in the Lotka-Volterra model.

      I may be misunderstanding, but I don't agree with this statement. The time-independent equilibrium solution with all species present (i.e. at non-zero abundances) takes the form

      x^* = A^{-1}r

      where A is the inverse of the community matrix, and r is the vector of growth rates. The exceptions to this would be when one or more species has abundance = 0, or A is not invertible. I don't think the authors intended to tackle either of these cases, but maybe I am misunderstanding that.

      So to me, the difficulty here is not in writing a closed-form solution for the equilibrium x^*, it is in writing the inverse matrix as a nice function of the entries of the matrix A itself, which is where the authors want to get to. In this light, it looks to me like the condition for feasibility (i.e. that all x^* are positive, which is necessary for an ecologically-interpretable solution) is maybe an approximation for the inverse of A---perhaps valid when off-diagonal entries are small. A weakness then for me was in understanding the range of validity of this approximation, and whether it still holds when off-diagonal entries of A (i.e. inter-specific interactions) are arbitrarily large. I could not tell from the simulation runs whether this full range of off-diagonal values was tested.

      We thank the reviewer for pointing this out and we agree that the language used is imprecise. The GLV model is solvable using the matrix inversion method but as they note, this does not give an interpretable expression in terms of the system parameters. This is important as we aim to build understanding of how these parameters (which in turn depend on temperature) affect the richness in communities. We have made this clearer in lines 372-379.

      In regards to the validity of the approximation we have significantly increased the detail of the method in the manuscript, including the assumptions it makes (lines 384-393). In general the method assumes that any individual interaction has a weak effect on abundance. This will fail when the variation in interactions becomes too strong but should be robust to changes in the average interaction strength across the community.

      As a secondary issue here, it would have been helpful to understand whether the authors' feasible solutions are always stable to small perturbations. In general, I would expect this to be an additional criterion needed to understand diversity, though as the authors point out there are certain broad classes of solutions where feasibility implies stability.

      As the reviewer notes previous work using the GLV model by ? has shown that stability almost surely implies stability in the GLV. Thus we expect that our richness estimates derived from feasibility will closely resemble those from stabiltiy. We have amended the maintext to make this argument clear on lines 321-335.

      (2) I did not follow the precise rationale for selecting the temperature dependence of growth rate and interaction rates, or how the latter could be tested with empirical data, though I do think that in principle this could be a valuable way to understand the role of temperature dependence in the Lotka-Volterra equations.

      First, as the authors note, "the temperature dependence of resource supply will undoubtedly be an important factor in microbial communities"

      Even though resources aren't explicitly modeled here, this suggests to me that at some temperatures, resource supply will be sufficiently low for some species that their growth rates will become negative. For example, if temperature dependence is such that the limiting resource for a given species becomes too low to balance its maintenance costs (and hence mortality rate), it seems that the net growth rate will be negative. The alternative would be that temperature affects resource availability, but never such that a limiting resource leads to a negative growth rate when a taxon is rare.

      On the other hand, the functional form for the distribution of growth rates (eq 3) seems to imply that growth rates are always positive. I could imagine that this is a good description of microbial populations in a setting where the resource supply rate is controlled independently of temperature, but it wasn't clear how generally this would hold.

      We thank the reviewer for their comment. The assumption of positive growth rates is indeed a feature of the Boltzmann-Arrhenius model of temperature dependence. We use the Boltzmann-Arrhenius model due to the dependence of growth on metabolic rate. As metabolic rate is ultimately determined by biochemical kinetics its temper- ature dependence is well described by the Boltzmann-Arrhenius. In addition to this reasoning there is a wealth of empirical evidence supporting the use of the Boltzmann- Arrhenius to describe the temperature dependence of growth rate in microbes.

      Ultimately the temperature dependence of resource supply is not something we can directly consider in our model. As such we have to assume that resource supply is sufficient to maintain positive growth rates in the community. Note that this assump- tion only requires resource supply is sufficient to maintain positive growth rates (i.e. the maximal growth rate of species in isolation) not that resource supply is sufficient to maintain growth in the presence of intra- and interspecific competition. We have updated the manuscript in lines 156-159 to make these assumptions more clear.

      Secondly, while I understand that the growth rate in the exponential phase for a single population can be measured to high precision in the lab as a function of temperature, the assumption for the form of the interaction rates' dependence on temperature seems very hard to test using empirical data. In the section starting L193, the authors seem to fit the model parameters using growth rate dependence on temperature, but then assume that it is reasonable to "use the same thermal response for growth rates and interactions". I did not follow this, and I think a weakness here is in not providing clear evidence that the functional form assumed in Equation (4) actually holds.

      The reviewer is correct, it is very difficult to measure interaction coefficients experi- mentally and to our knowledge there is little to no data available on their empirical temperature responses. We as a best guess use the observed variation in thermal physiology parameters for growth rate as a proxy assuming that interactions must also depend on metabolic rates of the interacting species (see also response to com- ment 8).

    1. Author response:

      Reviewer #3 (Public Review):

      The paper by Rai and colleagues examines the transcriptional response of Candida glabrata, a common human fungal pathogen, during interaction with macrophages. They use RNA PolII profiling to identify not just the total transcripts but instead focus on the actively transcribing genes. By examining the profile over time, they identify particular transcripts that are enriched at each timepoint, and build a hierarchical model for how a transcription factor, Xbp1, may regulate this response. Due to technical difficulties in identifying direct targets of Xbp1 during infection, the authors then turn to the targets of Xbp1 during cellular quiescence.

      The authors have generated a large and potentially impactful dataset, examining the responses of C. glabrata during an important host-pathogen interface. However, the conclusions that the authors make are not well supported by the data. The ChIP-seq is interesting, but the authors make conclusions about the biological processes that are differentially regulated without testing them experimentally. Because Candida glabrata has a significant percent of the genome without GO term annotation, the GO term enrichment analysis is less useful than in a model organism. To support these claims, the authors should test the specific phenotypes, and validate that the transcriptional signature is observed at the protein level.

      Additionally, the authors should also include images of the infections, along with measurements of phagocytosis, to show that the time points are the appropriate. At 30 minutes, are C. glabrata actually internalized or just associated? This may explain the difference in adherence genes at the early timepoint. For example, in Lines 123-132, the authors could measure the timing of ROS production by macrophages to determine when these attacks are deployed, instead of speculating based on the increased transcription of DNA damage response genes. Potentially, other factors could be influencing the expression of these proteins. At the late stage of infection, the authors should measure whether the C. glabrata cells are proliferating, or if they have escaped the macrophage, as other fungi can during infection. This may explain some of the increase in transcription of genes related to proliferation.

      An additional limitation to the interpretation of the data is that the authors should put their work in the context of the existing literature on C. albicans temporal adaptation to macrophages, including recent work from Munoz (doi: 10.1038/s41467-019-09599-8), Tucey (doi: 10.1016/j.cmet.2018.03.019), and Tierney (doi: 10.3389/fmicb.2012.00085), among others.

      When comparing the transcriptional profile between WT and xbp1 mutant, it is not clear whether the authors compared the strains under non-stress conditions. The authors should include an analysis of the wild-type to xbp1 mutants in the absence of macrophage stress, as the authors claims of precocious transcription may be a function of overall decreased transcriptional repression, even in the absence of the macrophage stress. The different cut-offs used to call peaks in the two strain backgrounds is also somewhat concerning-it is not clear to me whether that will obscure the transcriptional signature of each of the strains. Additionally, the authors go on to show that the xbp1 mutant has a significant proliferation defect in macrophages, so potentially this could confound the PolII binding sites if the cells are dying.

      In the section on hierarchical analysis of transcription factors, at least one epistasis experiment should have been performed to validate the functional interaction between Xbp1 and a particular transcription factor. If the authors propose a specific motif, they should test this experimentally through EMSA assays to fully test that the motif is functional.

      The jump from macrophages to quiescent culture is also not well justified. If the transcriptional program is so dynamic during a timecourse of macrophage infection, it is hard to translate the findings from a quiescent culture to this host environment.

      Overall, there is a strong beginning and the focus on active transcription in the macrophage is an exciting approach. However, the conclusions need additional experimental evidence.

      We thank this reviewer’s critical analysis of our manuscript and the comments.

      We fully agree that the jump from macrophages to quiescent culture is also not well justified. We have successfully performed CgXbp1 ChIP-seq during macrophage infection and have rewritten the manuscript according to the new results. With the CgXbp1 ChIP-seq data during macrophage infection added, we have removed the data related to quiescence to focus the paper on the macrophage response. Because of this, we have also removed the DNA binding motif analysis from this work and will report the findings in a separate manuscript comparing CgXbp1 bindings between macrophage response and quiescence.

      As mentioned above, the RNAPII ChIP-seq time course experiment compared RNAP occupancies at different times during infection to the first infection time point. We did not calculate relative to the data in the absence of stress (e.g. before infection), because Xbp1 was expressed at a low level and induced by stresses. Hence its role under no stress conditions is expected to be less than inside macrophages. In addition, up-regulation of its target genes depends on the presence of their transcriptional activators under the experimental conditions, which is going to be very different in normal growth media (RPMI or YPD; i.e. before infection) versus inside macrophages. Hence, comparing to normal growth media would not show the real CgXbp1 effects and/or the CgXbp1 effect might be different. In fact, this can be seen from the new RNAseq analysis of wildtype and Cgxbp1∆ C. glabrata cells in the presence and absence of fluconazole (which are added to the revised manuscript to study CgXbp1’s role on fluconazole resistance). The result shows that CgXbp1 (which was expressed at a low level) has a very small effect on global expression and the up-regulated genes are mainly related to transmembrane transport. More importantly, the effect of the Cgxbp1∆ mutant on TCA cycle and amino acid biosynthesis genes’ expression during macrophage infection is not observed when the mutant is grown under normal growth conditions (YPD without fluconazole). Therefore, the results show that CgXbp1 has condition-specific effects on global gene expression, which is also dependent on the transcriptional activators present in the cell. The result of the new RNAseq analysis of wildtype and Cgxbp1∆ C. glabrata cells in the absence of fluconazole is described in lines 329-339 as follows: “On the other hand, 135 genes were differentially expressed in the Cgxbp1∆ mutant during normal exponential growth (i.e. no fluconazole treatment) (Figure 6c) with up-regulated genes highly enriched with the “transmembrane transport” function and down- regulated genes associated with different metabolic processes (e.g. carbohydrate, glycogen and trehalose) (e.g. carbon metabolism, nucleotide metabolism, and transmembrane transport, etc.) (Supplementary Table 12). Interesting, the TCA cycle and amino acid biosynthesis genes, whose expressions were accelerated in the Cgxbp1∆ mutant during macrophage (Figure 3C, 3D), were not affected by the loss of CgXbp1 function under normal growth conditions (i.e. in YPD media without fluconazole) (Supplementary Figure 11, Supplementary Table 11), suggesting that the overall (direct and indirect) effects of CgXbp1 are condition-specific.”

      For the comment about RNAPII bindings affected by dying cells, our observation of reduced proliferation does not mean that the cells were dying, because we did observe increase in cell numbers over time (i.e. the cells were proliferating) but the rate of proliferation was slower in the Cgxbp1∆ mutant comparing to wildtype. Presumably, the reduced proliferation and/or growth within macrophages is due to poorer adaptation in and compromised response to macrophages.

      We have also discussed our findings in the context of the suggested (and other) literatures in various parts of the Discussion.

      Reviewer #4 (Public Review):

      Macrophages are the first line of defense against invading pathogens. C. glabrata must interact with these cells as do all pathogens seeking to establish an infection. Here, a ChIP-seq approach is used to measure levels of RNA polymerase II levels across Cg genes in a macrophage infection assay. Differential gene expression is analyzed with increasing time of infection. These differentially expressed genes are compared at the promoter level to identify potential transcription factors that may be involved in their regulation. A factor called CgXbp1 on the basis of its similar with the S. cerevisiae Xbp1 protein is characterized. ChIP-seq is done on CgXbp1 using in vitro grown cells and a potential binding site identified. Evidence is provided that CgXbp1 affects virulence in a Galleria system and that this factor might impact azole resistance.

      As the authors point out, candidiasis associated with C. glabrata has dramatically increased in the recent past. Understanding the unique aspects of this Candida species would be a great value in trying to unravel the basis of the increasing fungal disease caused by C. glabrata. The use of ChIP-seq analysis to assess the time-dependent association of RNA polymerase II with Cg genes is a nice approach. Identification of CgXbp1 as a potential participant in the control of this gene expression program is also interesting. Unfortunately, this work suffers by comparison to a significant amount of previous effort that renders the progress detailed here incremental at best.

      I agree that their ChIP-seq time course of RNA polymerase II distribution across the Cg genome is both elegant and an improvement on previous microarray experiments. However, these microarray experiments were carried out 14 years ago and while the current work is certainly at higher resolution, little more can be gleaned from the current work. The authors argue that standard transcriptional analysis is compromised by transcript stability effects. I would suggest that, while no approach is without issues, quite a bit has been learned from approaches like RNA-seq and there are recent developments to this technique that allow for a focus on newly synthesized mRNA (thiouridine labeling).

      The CgXbp1 characterization relies heavily on work from S. cerevisiae. This is disappointing as conservation of functional links between C. glabrata and S. cerevisiae is not always predictable.

      The effects caused by loss of CgXBP1 on virulence (Figure 4) may be statistically significant but are modest. No comparison is shown for another gene that has already been accepted to have a role in virulence to allow determination of the biological importance of this effect.

      The phenotypic effects of the loss of XBP1 on azole resistance look rather odd (Figure 6). The appearance of fluconazole resistant colonies in the xbp1 null strain occurs at a very low frequency and seems to resemble the appearance of rho0 cells in the population. The vast majority of xbp1 null cells do not exhibit increased growth compared to wild-type in the presence of fluconazole.

      Irrespective of the precise explanation, more analysis should be performed to confirm that CgXbp1 is negatively regulating the genes suggested in Figure 6A to be responsible for the increased fluconazole resistance.

      Additionally, the entire analysis of CgXbp1 is based on ChIP-seq performed using cells grown under very different conditions that the RNA polymerase II study. Evidence should be provided that the presumptive CgXbp1 target genes actually impact the expression profiles established earlier.

      We thank this reviewer’s critical analysis of our manuscript. We have done the following to address the comments. As a result, the manuscript is significantly improved.

      • The ChIP-seq data of Xbp1 in macrophage has been successfully generated and the result is now presented in Figure 2C-2F, and lines 182-227 of the revised manuscript. With the addition, we have removed the ChIPseq data related to quiescent from the revised manuscript and re-written the manuscript focusing on the role of Xbp1 in macrophage.

      • We agree that the conservation of functional links between C. glabrata and S. cerevisiae is not always predictable. That’s the reason why we did not solely rely on the S. cerevisiae network for inferring Xbp1’s functions, and had undertaken several different ways (e.g. ChIP-seq of Xbp1 and characterization of the Cgxbp1∆ mutant) to delineate its functions.

      • We also agree that the virulence effect is modest, but it is, nevertheless, an effect that may contribute to the overall virulence of C. glabrata. Since virulence is a pleiotropic trait involving many genes and every gene affects different aspects of the complex process, we feel that it is not fair to penalize a given gene based on its (weaker) effect relative to another gene. Therefore, we respectfully disagree that another gene should be included for benchmarking the effect.

      • We have measured C. glabrata cell numbers in a time course experiment. The result (presented in Figure 4A) showed that there was an increase in cell number at the end of the macrophage infection time course experiment (e.g. 8 hr). We have highlighted this information on lines 278-283.

      • Additional analysis of the fluconazole resistance phenotype of the Cgxbp1∆ mutant has been added, including standard MIC assays. The results are presented in Figure 5C-5E.

      • As suggested and to understand the role of CgXbp1 on fluconazole resistance, we have now carried out RNAseq analysis of WT and the Cgxbp1∆ mutant in the presence and absence of fluconazole. The genes differentially controlled in the Cgxbp1∆ mutant have been identified and a proposed model on how CgXbp1 affects fluconazole resistance is added to Figure 7 in the revised manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors conducted cross-species comparisons between the human brain and the macaque brain to disentangle the specific characteristics of structural development of the human brain. Although previous studies had revealed similarities and differences in brain anatomy between the two species by spatially aligning the brains, the authors made the comparison along the chronological axis by establishing models for predicting the chronological ages with the inputting brain structural features. The rationale is actually clear given that brain development occurs over time in both. More interestingly, the model trained on macaque data was better able to predict the age of humans than the human-trained model was at predicting macaque age. This revealed a brain cross-species age gap (BCAP) that quantified the discrepancy in brain development between the two species, and the authors even found this BCAP measure was associated with performance on behavioral tests in humans. Overall, this study provides important and novel insights into the unique characteristics of human brain development. The authors have employed a rigorous scientific approach, reflecting diligent efforts to scrutinize the patterns of brain age models across species. The clarity of the rationale, the interpretability of the methods, and the quality of the presentation all contribute to the strength of this work.

      We are grateful to your helpful and thorough review and for being so positive about our manuscript. Following your recommendations, we have added more analytic details that have strengthened our paper. We would like to thank you for your input.

      Reviewer #2 (Public Review):

      In the current study, Li et al. developed a novel approach that aligns chronological age to a cross-species brain age prediction model to investigate the evolutionary effect. This method revealed some interesting findings, like the brain-age gap of the macaque model in predicting human age will increase as chronological age increases, suggesting an evolutionary alignment between the macaque brain and the human brain in the early stage of development. This study exhibits ample novelty and research significance. However, I still have some concerns regarding the reliability of the current findings.

      We thank you for the positive and appreciative feedback on our work and the insightful comments, which we have addressed below.

      Question 1: Although the authors named their new method a "cross-species" model, the current study only focused on the prediction between humans and macaques. It would be better to discuss whether their method can also generalize to cross-species examination of other species (e.g., C. elegans), which may provide more comprehensive evolutionary insights. Also, other future directions with their new method are worth discussing.

      We appreciate your insightful comment regarding the generalizability of our model to other species. As you said, we indeed only performed human-macaque cross-species study not including other species. In our study, we only focused human and macaque because macaque is considered to be one of the closest primates to humans except chimpanzees and thus is considered to be the best model for studying human brain evolution. However, our proposed method has limitations that limit its generalizability for other species, e.g., C. elegans. First, our model was trained using MRI data, which limits its applicability to species for which such data is unavailable. This technological requirement brings a barrier to broaden cross-species application. Second, our current model is based on homologous brain atlases that are available for both humans and macaques. The lack of comparable atlases for other species further restricts the model's generalizability. We have discussed this limitation in the revised manuscript and outlined potential future directions to overcome these challenges. This includes discussing the need for developing comparable imaging techniques and standardized brain atlases across a wider range of species to enhance the model's applicability and broaden our understanding of cross-species neurodevelopmental patterns.

      On page 15, lines 11-18

      “However, the existing limitation should be noted regarding the generalizability of our proposed approach for cross-species brain comparison. Our current model relies on homologous brain atlases, and the lack of comparable atlases for other species restricts its broader applicability. To address this limitation, future research should focus on developing prediction models that do not depend on atlases. For instance, 3D convolutional neural networks could be trained directly on raw MRI data for age prediction. These deep learning models may offer greater flexibility for cross-species applications once the training within species is complete. Such advancements would significantly enhance the model's adaptability and expand its potential for comparative neuroscience studies across a wider range of species.”

      Question 2: Algorithm of prediction model. In the method section, the authors only described how they chose features, but did no description about the algorithm (e.g., supporting vector regression) they used. Please add relevant descriptions to the methods.

      Thank you for your comment. We apologize for not providing sufficient details about the model training process in our initial submission. In our study, we used a linear regression model for prediction. We have provided more details regarding the algorithm of prediction model in our response to Reviewer #1. For your convenience, we have attached them below.

      For details on the algorithm of prediction model:

      “A linear regression model was adopted for intra- and inter-species age prediction. The linear regression model was built including the following three main steps: 1) Feature selection: a total of two steps are required to extract the final features. The first step is preliminary extraction. First, all the human or macaque participants were divided into 10-fold and 9-fold was used for model training and 1-fold for model test. The preliminary features were chosen by identifying the significantly age-associated features with p < 0.01 during calculating Pearson’s correlation coefficients between all the 260 features and actual ages of the 9-fold subjects. This process was repeated 100 times. Since we obtained not exactly the same preliminary features each time, we thus further analyzed the preliminary features using two methods to determine the final features: common features and minimum mean absolute error (min MAE). Common features are the preliminary features that were selected in all the 100 times during preliminary model training. The min MAE features were the preliminary features that with the smallest MAE value during the 100 times model test for predicting age. After the above feature selections, we obtained two sets of features: 62 macaque features and 225 human features (common features) and 117 macaque features and 239 human features (min MAE). In addition, to further exclude the influences of unequal number of features in human and macaque, we also selected the first 62 features in human and macaque to test the model prediction performances. 2) Model construction: we conducted age prediction linear model using 10-fold cross-validation based on the selected features for human and macaque separately. The linear model parameters are obtained using the training set data and applied to the test set for prediction. The above process is also repeated 100 times. 3) Prediction: with the above results, we obtained the optimal linear prediction models for human and macaque. Next, we performed intra-species and inter-species brain age prediction, i.e., human model predicted human age, human model predicted macaque age, macaque model predicted macaque age and macaque model predicted human age. Three sets of features (62 macaque features and 225 human features; 117 macaque features and 239 human features; 62 macaque features and 62 human features) were used to test the prediction models for cross-validation and to exclude effects of different number of features in human and macaque. In the main text, we showed the results of brain age prediction, brain developmental and evolutional analyses based on common features and the results obtained using other two types of features were shown in supplementary materials. The prediction performances were evaluated by calculating the Pearson’s correlation and MAE between actual ages and predicted ages.”

      Question 3: Sex difference. The sex difference results are strange to me. For example, in the second row of Figure Supplement 3A, different models show different correlation patterns, but why their Pearson's r is all equal to 0.3939? If they are only typo errors, please correct them. The authors claimed that they found no sex difference. However, the results in Figure Supplement 3 show that, the female seems to have poorer performance in predicting macaque age from the human model. Moreover, accumulated studies have reported sex differences in developing brains (Hines, 2011; Kurth et al., 2021). I think it is also worth discussing why sex differences can't be found in the evolutionary effect.

      Reference:

      Hines, M. (2011). Gender development and the human brain. Annual review of neuroscience, 34, 69-88.

      Kurth, F., Gaser, C., & Luders, E. (2021). Development of sex differences in the human brain. Cognitive Neuroscience, 12(3-4), 155-162.

      It is recommended that the authors explore different prediction models for different species. Maybe macaques are suitable for linear prediction models, and humans are suitable for nonlinear prediction models.

      Thank you for pointing the typos out and comments on sex difference. In Figure Supplement 3A, there are typos for Pearson’s r values and we have corrected it in updated Figure 2-figure supplement 3. For details, please see the updated Figure 2-figure supplement 3 and the following figure.

      Regarding gender effects, we acknowledge your point about the importance of gender differences in understanding brain evolution and development. In our study, however, our primary goal was to develop a robust age prediction model by maximizing the number of training samples. To mitigate gender-related effects in our main results, we incorporated gender information as a covariate in the ComBat harmonization process. We conducted a supplementary analysis just to demonstrate the stability of our proposed cross-species age prediction model by separating the data with gender variable not to investigate gender differences. Although our results demonstrated that gender-specific models could still significantly predict chronological age, we refrained from emphasizing these models' performance in gender-specific species comparisons due to difficulty in explanation for the predicted gender difference. For cross-species prediction, whether a higher Pearson’s r value between actual age and predicted age could reflect conserved evolution for male or female is not convincing. In addition, we adopted same not different prediction models for human and macaque aiming to establish a comparable model between species. Generally speaking, the nonlinear model could obtain better prediction accuracy than linear model. If different species used different models, it is unfair to perform cross-species prediction. Importantly, our study aimed to developed new index based on the same prediction models to quantify brain evolution difference, i.e., brain cross-species age gap (BCAP) instead of traditional statistical analyses. Different prediction models for different species may introduce bias causing by prediction methods and thus impacting the accuracy of BCAP. Thus, we adopted the linear model with best prediction performances for intra-species prediction in this study for cross-species prediction. Although our main goal in this study is to set up stable cross-species prediction model and the models built using either male or female subjects showed good performances during cross-species prediction, however, as your comment, how to unbiasedly characterize evolutionary gender differences using machining learning approaches needs to be further investigated since there are many reports about the gender difference in developing brain in humans. In fact, whether macaque brains have the same gender differences as humans is an interesting scientific question worth studying. Thus, we have included a discussion on how to use machining learning method to study the evolutionary gender difference in our revised manuscript.

      On page 15, lines 18-23 and page 16, line 1-4

      “Many studies have reported sex differences in developing human brains (Hines, 2011; Kurth, Gaser, & Luders, 2021), however, whether macaque brains have similar sex differences as humans is still unknown. We used machining learning method for cross-species prediction to quantify brain evolution and the established prediction models are stable even when only using male or female data, which may indicate that the proposed cross-species prediction model has no evolutionary sex difference. Although the stable prediction model can be established in either male or female participants for cross-species prediction, this indeed does not mean that there are no evolutionary sex differences due to lack of quantitative comparative analysis. In the future, we need to develop more objective, quantifiable and stable index for studying sex differences using machining learning methods to further identify sex differences in the evolved brain”

      Reviewer #3 (Public Review):

      The authors identified a series of WM and GM features that correlated with age in human and macaque structural imaging data. The data was gathered from the HCP and WA studies, which was parcellated in order to yield a set of features. Features that correlated with age were used to train predictive intra and inter-species models of human and macaque age. Interestingly, while each model accurately predicted the corresponding species age, using the macaque model to predict human age was more accurate than the inverse (using the human model to predict macaque age). In addition, the prediction error of the macaque model in predicting human age increased with age, whereas the prediction error of the human model predicting macaque age decreased with age.

      After elaboration of the predictive models, the authors classified the features for prediction into human-specific, macaque-specific and common to human and macaque, where they most notably found that macaque-only and common human-macaque areas were located mainly in gray matter, with only a few human-specific features found in gray matter. Furthermore, the authors found significant correlations between BCAP and picture vocabulary (positive correlation) test and visual sensitivity (negative correlation) test. Several white matter tracts (AF, OR, SLFII) were also identified showing a correlation with BCAP.

      Thank you for providing this excellent summary. We appreciate your thorough review and concise overview of our work.

      STRENGTHS AND WEAKNESSES

      The paper brings an interesting perspective on the evolutionary trajectories of human and non-human primate brain structure, and its relation to behavior and cognition. Overall, the methods are robust and support the theoretical background of the paper. However, the overall clarity of the paper could be improved. There are many convoluted sentences and there seems to be both repetition across the different sections and unclear or missing information. For example, the Introduction does not clearly state the research questions, rather just briefly mentions research gaps existing in the literature and follows by describing the experimental method. It would be desirable to clearly state the theoretical background and research questions and leave out details on methodology. In addition, the results section repeats a lot of what is already stated in the methods. This could be further simplified and make the paper much easier to read.

      In the discussion, authors mention that "findings about cortex expansion are inconsistent and even contradictory", a more convincing argument could be made by elaborating on why the cortex expansion index is inadequate and how BCAP is more accurate.

      Thank you for highlighting the interesting aspects of our work. We are sorry for the lack of the clarity in certain parts of our manuscript. Following your valuable suggestions, we have revised the manuscript to reduce unnecessary repetitions and provide a clearer statement of our research question in Introduction. Specifically, unlike previous analyses of human and macaque evolution using comparative neuroscience, this study embeds chronological axis into the cross-species evolutionary analysis process. It constructed a linear prediction model of brain age for humans and macaques, and quantitatively described the degree of evolution. The brain structure based cross-species age prediction model and cross-species brain age differences proposed in this study further eliminate the inherent developmental effects of humans and macaques on cross-species evolutionary comparisons, providing new perspectives and approaches for studying cross-species development. Regarding the existing repetition in the results section, we have simplified them for the clarity. Regarding the comparison between the cortex expansion index and BCAP, we would like to emphasize that the cortex expansion index was derived without fully considering cross-species alignment along the chronological axis. Specifically, this index does not correspond to a specific developmental stage, but rather focuses on a direct comparison between the two species. In contrast, BCAP addresses this limitation by utilizing a prediction model to establish alignment (or misalignment) between species at the individual level. Therefore, BCAP may serve as a more flexible and nuanced tool for cross-species brain comparison.

      STUDY AIMS AND STRENGTH OF CONCLUSIONS

      Overall, the methods are robust and support the theoretical background of the paper, but it would be good to state the specific research questions -even if exploratory in nature- more specifically. Nevertheless, the results provide support for the research aims.

      Thank you for excellent suggestion. We have revised our introduction to state the specific research question as mentioned above.

      IMPACT OF THE WORK AND UTILITY OF METHODS AND DATA TO THE COMMUNITY

      This study is a good first step in providing a new insight into the neurodevelopmental trajectories of humans and non-human primates besides the existing cortical expansion theories.

      Thank you for your encouraging comment.

      ADDITIONAL CONTEXT:

      It should be clearly stated both in the abstract and methods that the data used for the experiment came from public databases.

      Thank you for your suggestion. We have added this information in both abstract and method. For details, please see page 2, line 9 in Abstract section; page 16, lines 10-11 and page 17, lines 6-10 in Materials and Method section.

    1. Author response:

      Reviewer #1 (Public Review):

      Using structural analysis, Bonchuk and colleagues demonstrate that the TTK-like BTB/POZs of insects form stable hexameric assemblies composed of trimers of POZ dimers, a configuration observed consistently across both homomultimers and heteromultimers, which are known to be formed by TTK-like BTB/POZ domains. The structural data is comprehensive, unambiguous, and further supported by theoretical fold prediction analyses. In particular the judicious complementation of experiments and fold prediction is commendable. This study now adds an important cog that might help generalize the general principles of the evolution of multimerization in members of this fold family.

      I strongly feel that enhancing the inclusivity of the discussion would strengthen the paper. Below, I suggest some additional points for consideration for the same.

      Major points.

      1) It would be valuable to discuss alternative multimer assembly interfaces, considering the diverse ways POZs can multimerize. For instance, the Potassium channel POZ domains form tetramers. A comparison of their inter-subunit interface with that of TTK and non-TTK POZs could provide insightful contrasts.

      Thanks for the suggestion, we added this important comparison, as well as comparison with recently published structures of filament-forming BTB domains.

      2) The so-called TTK motif, despite its unique sequence signature, essentially corresponds to the N-terminal extension observed in other "non-TTK" proteins such as Miz-1. Given Miz-1's structure, it becomes evident that the utilization of the N-terminal extension for dimerization is shared with the TTK family, suggesting a common evolutionary origin in metazoan transcription factors. Early phylogenetic trees (e.g. in PMID: 9917379) support the grouping of the TTK-like POZs with other animal Transcription factors containing POZ domains such as those with Kelch repeats further suggesting that the extension might be ancestral. Structural investigations by modeling prominent examples or comparing known structures of similar POZ domains, could support this inference. Control comparisons with POZ domains from fungi, plants and amoebozoans like Dictyostelium could offer additional insights.

      We performed AlphaFold2-Multimer modeling of dimers of all BTB domains from the most ancestral metazoan clades, Placozoa and Porifera, along with BTBs from Choanoflagellates – the closest to first metazoans unicellular eukaryotes. The presence of N-terminal beta-sheet was evaluated. KLHL-BTBs are present in all eukaryotes and likely are predecessors of ZBTB domains. According to AlphaFold modeling of dimers, all KLHL-BTB domains of plants and basal metazoans have alpha1 helix, but most of these domains from do not possess additional N-terminal beta-strand (beta1) characteristic for ZBTB domains. We found only one KLHL-BTB (Uniprot ID: AA9VCT1_MONBE) with such N-terminal extension in Choanoflagellate proteome, one in Dictyostelium proteome (Q54F31_DICDI), and 7 (out of 43 BTB domains in total) and 13 (out of 81) such domains in Trichoplax and Amphimedon proteomes correspondingly. There was no significant sequence similarity of beta1 element at the level of primary sequence. However, most of these domains bear 3-box/BACK extension and represent typical KLHL-BTBs which are member of E3 ubiquitin-ligase complexes, they are often associated with protein-protein interacting MATH domain or WD40 repeats. We found only one protein in Trichoplax proteome with beta1 strand devoid of 3-box/BACK (B3RQ74_TRIAD), thus resembling ZBTB topology. Thus, likely emergence of BTB domains of this subtype occurred early in Metazoan evolution. At this point ZBTBs were not yet associated with zinc-fingers. According to our survey, actual fusion of ZBTB domain with zinc-finger domains occurred in the evolution of earlier bilaterian organisms since proteins with such domain architecture are not found in Radiata but are present in basal Protostomia and Deuterostomia clades. TTK-type sequence is characteristic only for Arthropoda and emerged early in their evolution. We added all these data to the article.

      3) Exploring the ancestral presence of the aforementioned extension in metazoan transcription factors could serve as a foundation for understanding the evolutionary pathway of hexamerization. This analysis could shed light on exposed structural regions that had the potential to interact post-dimerization with the N-terminal extension and also might provide insights into the evolution of multimer interfaces, as observed in the Potassium channel.

      We added this important comparison as well as comparison with recent structures of filament-forming BTB domains.

      4) Considering the role of conserved residues in the multimer interface is crucial. Reference to conserved residues involved in multimer formation, such as discussed in PMID: 9917379, would enrich the discussion.

      We updated our description of multimer interface with respect to conservation of residues.

      Reviewer #2 (Public Review):

      BTB domains are protein-protein interaction domains found in diverse eukaryotic proteins, including transcription factors. It was previously known that many of the Drosophila transcription factor BTB domains are of the TTK-type - these are defined as having a highly-conserved motif, FxLRWN, at their N-terminus, and they thereby differ from the mammalian BTB domains. Whereas the well-characterised mammalian BTB domains are dimeric, several Drosophila TTK-BTB domains notably form multimers and function as chromosome architectural proteins. The aims of this work were (i) to determine the structural basis of multimerisation of the Drosophila TTK-BTB domains, (ii) to determine how different Drosophila TTK-BTB domains interact with each other, and (iii) to investigate the evolution of this subtype of BTB domain.

      The work significantly advances our understanding of the biology of BTB domains. The conclusions of the paper are mostly well-supported, although some aspects need clarification:

      Hexameric organisation of the TTK-type BTB domains:

      Using cryo-EM, the authors showed that the CG6765 TTK-type BTB domain forms a hexameric assembly in which three "classic" BTB dimers interact via a beta-sheet interface involving the B3 strand. This is particularly interesting, as this region of the BTB domain has recently been implicated in protein-protein interactions in a mammalian BTB-transcription factor, MIZ1. SEC-MALS analysis indicated that the LOLA TTK-type BTB domain is also hexameric, and SAXS data was consistent with a hexameric assembly of the CG6765- and LOLA BTB domains.

      The data regarding the hexameric organisation is convincing. However, interpreting the role of specific regions of the BTB domain is difficult because the description of the molecular contacts lacks depth.

      Heteromeric interactions between TTK-type BTB domains:

      The authors use yeast two-hybrid assays to study heteromeric interactions between various Drosophila TTK-type BTB domains. Such assays are notorious for producing false positives, and this needs to be mentioned. Although the authors suggest that the heteromeric interactions are mediated via the newly-identify B3 interaction interface, there is no evidence to support this, since mutation of B3 yielded insoluble proteins.

      We are aware that Y2H can give false positive results in cases where the BTB domain fused to the DNA binding domain can activate reporter genes. Therefore, all tested BTB domains were examined for their ability to activate transcription. Furthermore, in our study, assays with non-TTK-type BTB domains, which showed almost no interactions, provide additional negative control. We have added a corresponding disclaimer in the text. We agree that our data do not explain the basis for heteromeric interactions. Design of mutations in B3 beta-sheet proved to be complicated, using of biochemical methods to study the principles of heteromer assembly also does not seem to be feasible since most TTK-type BTBs tend to form aggregates and are difficult to be expressed and purified. But most important issue is that demonstrated ability of heteromer assembly through B3 in few tested pairs cannot be applied for all pairs, some of them still may use different mechanism. We used AlphaFold to predict possible mechanisms of heteromer assemblies. AlphaFold suggested that usage of both B3 and conventional dimerization interfaces for heteromeric interactions are possible in various cases, with preference of one over another in different pairs. Thus, most likely the presence of two potential heteromerization interfaces extends the heteromerization capability of these domains. We changed the text accordingly.

      Evolution of the TTK-type BTB domains:

      The authors carried out a bioinformatics analysis of BTB proteins and showed that most of the Drosophila BTB transcription factors (24 out of 28) are of the TTK-type. They investigated how the TTK-type BTB domains emerged during evolution, and showed that these are only found in Arthropoda, and underwent lineage-specific expansion in the modern phylogenetic groups of insects. These findings are well-supported by the evidence.

    1. Author response:

      Reviewer #1 - Public Review

      This report describes work aiming to delineate multi-modal MRI correlates of psychopathology from a large cohort of children of 9-11 years from the ABCD cohort. While uni-modal characterisations have been made, the authors rightly argue that multi-modal approaches in imaging are vital to comprehensively and robustly capture modes of large-scale brain variation that may be associated with pathology. The primary analysis integrates structural and resting-state functional data, while post-hoc analyses on subsamples incorporate task and diffusion data. Five latent components (LCs) are identified, with the first three, corresponding to p-factor, internal/externalising, and neurodevelopmental Michelini Factors, described in detail. In addition, associations of these components with primary and secondary RSFC functional gradients were identified, and LCs were validated in a replication sample via assessment of correlations of loadings.

      1.1) This work is clearly novel and a comprehensive study of associations within this dataset. Multi-modal analyses are challenging to perform, but this work is methodologically rigorous, with careful implementation of discovery and replication assessments, and primary and exploratory analyses. The ABCD dataset is large, and behavioural and MRI protocols seem appropriate and extensive enough for this study. The study lays out comprehensive associations between MRI brain measures and behaviour that appear to recapitulate the established hierarchical structure of psychopathology.

      We thank Reviewer 1 for appreciating our methods and findings, and we address their suggestions below:

      1.2) The work does have weaknesses, some of them acknowledged. There is limited focus on the strength of observed associations. While the latent component loadings seem reliably reproducible in the behavourial domain, this is considerably less the case in the imaging modalities. A considerable proportion of statistical results focuses on spatial associations in loadings between modalities - it seems likely that these reflect intrinsic correlations between modalities, rather than associations specific to any latent component.

      We appreciate the Reviewer’s comment, and minimized the reporting of correlations between the loadings from the different modalities in the revised Results (specifically subsections on LC1, LC2, and LC3). We now refer to Table S4 in each subsection for this information: “Spatial correlations between modality-specific loadings are reported in Supplementary file 1c.”

      For completeness, we report the intrinsic correlations between the different modalities in Supplementary file 1c (P.19):

      “Lastly, although the current work aimed to reduce intrinsic correlations between variables within a given modality through running a PCA before the PLS approach, intrinsic correlations between measures and modalities may potentially be a remaining factor influencing the PLS solution. We, thus, provided an additional overview of the intrinsic correlations between the different neuroimaging data modalities in the supporting results (Supplementary file 1c).”

      1.3) Assessment of associations with functional gradients is similarly a little hard to interpret. Thus, it is hard to judge the implications for our understanding of the neurophysiological basis of psychopathology and the ability of MRI to provide clinical tools for, say, stratification.

      We now provide additional context, including a rising body of theoretical and empirical work, that outlines the value of functional gradients and cortical hierarchies in the understanding of brain development and psychopathology. Please see P.26.

      “Initially demonstrated at the level of intrinsic functional connectivity (Margulies et al., 2016), follow up work confirmed a similar cortical patterning using microarchitectural in-vivo MRI indices related to cortical myelination (Burt et al., 2018; Huntenburg et al., 2017; Paquola et al., 2019), post-mortem cytoarchitecture (Goulas et al., 2018; Paquola et al., 2020, 2019), or post-mortem microarray gene expression (Burt et al., 2018). Spatiotemporal patterns in the formation and maturation of large-scale networks have been found to follow a similar sensory-to-association axis; moreover, there is the emerging view that this framework may offer key insights into brain plasticity and susceptibility to psychopathology (Sydnor et al., 2021). In particular, the increased vulnerability of transmodal association cortices in late childhood and early adolescence has been suggested to relate to prolonged maturation and potential for plastic reconfigurations of these systems (Paquola et al., 2019; Park et al., 2022b). Between mid-childhood and early adolescence, heteromodal association systems such as the default network become progressively more integrated among distant regions, while being more differentiated from spatially adjacent systems, paralleling the development of cognitive control, as well as increasingly abstract and logical thinking. [...] This suggests that neurodevelopmental difficulties might be related to alterations in various processes underpinned by sensory and association regions, as well as the macroscale balance and hierarchy of these systems, in line with previous findings in several neurodevelopmental conditions, including autism, schizophrenia, as well as epilepsy, showing a decreased differentiation between the two anchors of this gradient (Hong et al., 2019). In future work, it will be important to evaluate these tools for diagnostics and population stratification. In particular, the compact and low dimensional perspective of gradients may provide beneficial in terms of biomarker reliability as well as phenotypic prediction, as previously demonstrated using typically developing cohorts (Hong et al. 2020) On the other hand, it will be of interest to explore in how far alterations in connectivity along sensory-to-transmodal hierarchies provide sufficient graduality to differentiate between specific psychopathologies, or whether they, as the current work suggests, mainly reflect risk for general psychopathology and atypical development.”

      1.4) The observation of a recapitulation of psychopathology hierarchy may be somewhat undermined by the relatively modest strength of the components in the imaging domain.

      We thank the Reviewer for this comment, and now expressed this limitation in the revised Discussion, P.23.

      “The p factor, internalizing, externalizing, and neurodevelopmental dimensions were each associated with distinct morphological and intrinsic functional connectivity signatures, although these relationships varied in strength.”

      1.5) The task fMRI was assessed with a fairly basic functional connectivity approach, not using task timings to more specifically extract network responses.

      In the revised Discussion on P.24, we acknowledge that more in-depth analyses of task-based fMRI may have offered additional insights into state-dependent changes in functional architecture.

      “While the current work derived main imaging signatures from resting-state fMRI as well as grey matter morphometry, we could nevertheless demonstrate associations to white matter architecture (derived from diffusion MRI tractography) and recover similar dimensions when using task-based fMRI connectivity. Despite subtle variations in the strength of observed associations, the latter finding provided additional support that the different behavioral dimensions of psychopathology more generally relate to alterations in functional connectivity. Given that task-based fMRI data offers numerous avenues for analytical exploration, our findings may motivate follow-up work assessing associations to network- and gradient-based response strength and timing with respect to external stimuli across different functional states.”

      1.6) Overall, the authors achieve their aim to provide a detailed multimodal characterisation of MRI correlations of psychopathology. Code and data are available and well organised and should provide a valuable resource for researchers wanting to understand MRI-based neural correlates of psycho-pathology-related behavioural traits in this important age group. It is largely a descriptive study, with comparisons to previous uni-modal work, but without particularly strong testing of neuroscience hypotheses.

      We thank the Reviewer for recognizing the detail and rigor of data-driven study and extensive code and data documentation.

      Reviewer #2 - Public Review

      In "Multi-modal Neural Correlates of Childhood Psychopathology" Krebets et al. integrate multi-modal neuroimaging data using machine learning to delineate dissociable links to diverse dimensions of psychopathology in the ABCD sample. This paper had numerous strengths including a superb use of a large resource dataset, appropriate analyses, beautiful visualizations, clear writing, and highly interpretable results from a data-driven analysis. Overall, I think it would certainly be of interest to a general readership. That being said, I do have several comments for the authors to consider.

      We thank Dr Satterthwaite for the positive evaluation and helpful comments.

      2.1) Out-of-sample testing: while the permutation testing procedure for the PLS is entirely appropriate, without out-of-sample testing the reported effect sizes are likely inflated.

      As discussed in the editorial summary of essential revisions, we agree that out-of-sample prediction indeed provides stronger estimates of generalizability. We assess this by applying the PCA coefficients derived from the discovery cohort imaging data to the replication cohort imaging data. The resulting PCA scores and behavioral data were then z-scored using the mean and standard deviation of the replication cohort. The SVD weights derived from the discovery cohort were applied to the normalized replication cohort data to derive imaging and behavioral composite scores, which were used to recover the contribution of each imaging and behavioral variable to the LCs (i.e., loadings). Out-of-sample replicability of imaging (mean r=0.681, S.D.=0.131) and behavioral (mean r=0.948, S.D.=0.022) loadings was generally high across LCs 1-5. This analysis is reported in the revised manuscript (P.18).

      “Generalizability of reported findings was also assessed by directly applying PCA coefficients and latent components weights from the PLS analysis performed in the discovery cohort to the replication sample data. Out-of-sample prediction was overall high across LCs1-5 for both imaging (mean r=0.681, S.D.=0.131) and behavioral (mean r=0.948, S.D.=0.022) loadings.”

      2.2) Site/family structure: it was unclear how site/family structure were handled as covariates.

      Only unrelated participants were included in discovery and replication samples (see P.6). The site variable was regressed out of the imaging and behavioral data prior to the PLS analysis using the residuals from a multiple linear model which also included age, age2, sex, and ethnicity. This is now clarified on P.29:

      “Prior to the PLS analysis, effects of age, age2, sex, site, and ethnicity were regressed out from the behavioral and imaging data using a multiple linear regression to ensure that the LCs would not be driven by possible confounders (Kebets et al., 2021, 2019; Xia et al., 2018). The imaging and behavioral residuals of this procedure were input to the PLS analysis.”

      2.3) Anatomical features: I was a bit surprised to see volume, surface area, and thickness all evaluated - and that there were several comments on the correspondence between the SA and volume in the results section. Given that cortical volume is simply a product of SA and CT (and mainly driven by SA), this result may be pre-required.

      As suggested, we reduced the reporting of correlations between the loadings from the different modalities in the revised Results (specifically subsections on LC1, LC2, and LC3). Instead, we now refer to Table S4 in each subsection for this information: “Spatial correlations between modality-specific loadings are reported in Supplementary file 1c.”

      We also reran the PLS analysis while only including thickness and surface area as our structural metrics, to account for potential redundancy of these measures with volume. This analysis and associated findings are reported on P.36 and P.19:

      “As cortical volume is a result of both thickness and surface area, we repeated our main PLS analysis while excluding cortical volume from our imaging metrics and report the consistency of these findings with our main model.”

      “Third, to account for redundancy within structural imaging metrics included in our main PLS model (i.e., cortical volume is a result of both thickness and surface area), we also repeated our main analysis while excluding cortical volume from our imaging metrics. Findings were very similar to those in our main analysis, with an average absolute correlation of 0.898±0.114 across imaging composite scores of LCs 1-5.”

      2.4) Ethnicity: the rationale for regressing ethnicity from the data was unclear and may conflict with current best practices.

      We thank the Reviewer for this comment. In light of recent discussions on including this covariate in large datasets such as ABCD (e.g., Saragosa-Harris et al., 2022), we elaborate on our rationale for including this variable in our model in the revised manuscript on P.30:

      “Of note, the inclusion of ethnicity as a covariate in imaging studies has been recently called into question. In the present study, we included this variable in our main model as a proxy for social inequalities relating to race and ethnicity alongside biological factors (age, sex) with documented effects on brain organization and neurodevelopmental symptomatology queried in the CBCL.”

      We also assess the replicability of our analyses when removing race and ethnicity covariates prior to computing the PLS analysis and correlating imaging and behavioral composite scores across both models. We report resulting correlations in the revised manuscript (P.37, 19, and 27):

      “We also assessed the replicability of our findings when removing race and ethnicity covariates prior to computing the PLS analysis and correlating imaging and behavioral composite scores across both models.”

      “Moreover, repeating the PLS analysis while excluding this variable as a model covariate yielded overall similar imaging and behavioral composites scores across LCs to our original analysis. Across LCs 1-5, the average absolute correlations reached r=0.636±0.248 for imaging composite scores, and r=0.715±0.269 for behavioral composite scores. Removing these covariates seemed to exert stronger effects on LC3 and LC4 for both imaging and behavior, as lower correlations across models were specifically observed for these components.”

      “Although we could consider some socio-demographic variables and proxies of social inequalities relating to race and ethnicity as covariates in our main model, the relationship of these social factors to structural and functional brain phenotypes remains to be established with more targeted analyses.”

      2.5) Data quality: the authors did an admirable job in controlling for data quality in the analyses of functional connectivity data. However, it is unclear if a comparable measure of data quality was used for the T1/dMRI analyses. This likely will result in inflated effect sizes in some cases; it has the potential to reduce sensitivity to real effects.

      We agree that data quality was not accounted for in our analysis of T1w- and diffusion-derived metrics. We now accounted for T1w image quality by adding manual quality control ratings to the regressors applied to all structural imaging metrics prior to performing the PLS analysis, and reported the consistency of this new model with original findings. See P.36, P.19:

      “We also considered manual quality control ratings as a measure of T1w scan quality. This metric was included as a covariate in a multiple linear regression model accounting for potential confounds in the structural imaging data, in addition to age, age2, sex, site, ethnicity, ICV, and total surface area. Downstream PLS results were then benchmarked against those obtained from our main model.”

      “Considering scan quality in T1w-derived metrics (from manual quality control ratings) yielded similar results to our main analysis, with an average correlation of 0.986±0.014 across imaging composite scores.”

      As for diffusion imaging, we also regressed out effects of head motion in addition to age, age2, sex, site, and ethnicity from FA and MD measures and reported the consistency with our original results (P.36, P.19):

      “We tested another model which additionally included head motion parameters as regressors in our analyses of FA and MD measures, and assessed the consistency of findings from both models.”

      “Additionally considering head motion parameters from diffusion imaging metrics in our model yielded consistent results to those in our main analyses (mean r=0.891, S.D.=0.103; r=0.733-0.998).”

      Reviewer #3 - Public Review

      In this study, the authors utilized the Adolescent Brain Cognitive Development dataset to investigate the relationship between structural and functional brain network patterns and dimensions of psychopathology. They identified multiple components, including a general psychopathology (p) factor that exhibited a strong association with multimodal imaging features. The connectivity signatures associated with the p factor and neurodevelopmental dimensions aligned with the sensory-to-transmodal axis of cortical organization, which is linked to complex cognition and psychopathology risk. The findings were consistent across two separate subsamples and remained robust when accounting for variations in analytical parameters, thus contributing to a better understanding of the biological mechanisms underlying psychopathology dimensions and offering potential brain-based vulnerability markers.

      3.1) An intriguing aspect of this study is the integration of multiple neuroimaging modalities, combining structural and functional measures, to comprehensively assess the covariance with various symptom combinations. This approach provides a multidimensional understanding of the risk patterns associated with mental illness development.

      We thank the Reviewer for acknowledging the multimodal approach, and for the constructive suggestions.

      3.2) The paper delves deeper into established behavioral latent variables such as the p factor, internalizing, externalizing, and neurodevelopmental dimensions, revealing their distinct associations with morphological and intrinsic functional connectivity signatures. This sheds light on the neurobiological underpinnings of these dimensions.

      We are happy to hear the Reviewer appreciates the gain in understanding neural underpinnings of dimensions of psychopathology resulting from the current work.

      3.3) The robustness of the findings is a notable strength, as they were validated in a separate replication sample and remained consistent even when accounting for different parameter variations in the analysis methodology. This reinforces the generalizability and reliability of the results.

      We appreciate that the Reviewer found our robustness and generalizability assessment convincing.

      3.4) Based on their findings, the authors suggest that the observed variations in resting-state functional connectivity may indicate shared neurobiological substrates specific to certain symptoms. However, it should be noted that differences in resting-state connectivity between groups can stem from various factors, as highlighted in the existing literature. For instance, discrepancies in the interpretation of instructions during the resting state scan can influence the results. Hence, while their findings may indicate biological distinctions, they could also reflect differences in behavior.

      For the ABCD dataset, resting-state fMRI scans were based on eyes open and passive viewing of a crosshair, and are thus homogenized. We acknowledge, however, that there may still be state-to-state fluctuations contributing to the findings, and this is now discussed in the revised Discussion, on P.28. Note, however, that prior literature has generally also suggested rather modest impacts of cognitive and daily variation on resting-state functional networks, compared to much more dominating inter-individual and inter-group factors.

      “Finally, while prior research has shown that resting-state fMRI networks may be affected by differences in instructions and study paradigm (e.g., with respect to eyes open vs closed) (Agcaoglu et al., 2019), the resting-state fMRI paradigm is homogenized in the ABCD study to be passive viewing of a centrally presented fixation cross. It is nevertheless possible that there were slight variations in compliance and instructions that contributed to differences in associated functional architecture. Notably, however, there is a mounting literature based on high-definition fMRI acquisitions suggesting that functional networks are mainly dominated by common organizational principles and stable individual features, with substantially more modest contributions from task-state variability (Gratton et al. 2018). These findings, thus, suggest that resting-state fMRI markers can serve as powerful phenotypes of psychiatric conditions, and potential biomarkers (Abraham et al., 2017; Gratton et al., 2020; Parkes et al., 2020).”

      3.5) The authors conducted several analyses to investigate the relationship between imaging loadings associated with latent components and the principal functional gradient. They found several associations between principal gradient scores and both within- and between-network resting-state functional connectivity (RSFC) loadings. Assessing the analysis presented here proves challenging due to the nature of relating loadings, which are partly based on the RSFC, to gradients derived from RSFC. Consequently, a certain level of correlation between these two variables would be expected, making it difficult to determine the significance of the authors' findings. It would be more intriguing if a direct correlation between the composite scores reflecting behavior and the gradients were to yield statistically significant results.

      We thank the Reviewer for the comment, and agree that investigating gradient-behavior relationships could offer additional insights into the neural basis of psychiatric symptomatology. However, the current analysis pipeline precludes this direct comparison which is performed on a region-by-region basis across the span of the cortical gradient. Indeed, the behavioral loadings are provided for each CBCL item, and not cortical regions.

      The Reviewer also evokes concerns of potential circularity in our analysis, as we compared imaging loadings, which are partially based on RSFC, and gradient values generated from the same RSFC data. In response to this comment, we cross-validated our findings using an RSFC gradient derived from an independent dataset (HCP), showing highly consistent findings to those presented in the manuscript. This correlation is now reported in the Results section P.15.

      “A similar pattern of findings was observed when cross-validating between- and within-network RSFC loadings to a RSFC gradient derived from an independent dataset (HCP), with strongest correlations seen for between-network RSFC loadings for LC1 and LC3 (LC1: r=0.50, pspin<0.001; LC3: r=0.37, pspin<0.001).”

      We furthermore note similar correlations between imaging loadings and T1w/T2w ratio in the same participants, a proxy of intracortical microstructure and hierarchy (Glasser et al., 2011). These findings are now detailed in the revised Results, P.15-16:

      “Of note, we obtain similar correlations when using T1w/T2w ratio in the same participants, a proxy of intracortical microstructure and hierarchy (Glasser et al., 2011). Specifically, we observed the strongest association between this microstructural marker of the cortical hierarchy and between-network RSFC loadings related to LC1 (r=-0.43, pspin<0.001).”

      3.6) Lastly, regarding the interpretation of the first identified latent component, I have some reservations. Upon examining the loadings, it appears that LC1 primarily reflects impulse control issues rather than representing a comprehensive p-factor. Furthermore, it is worth noting that within the field, there is an ongoing debate concerning the interpretation and utilization of the p-factor. An insightful publication on this topic is "The p factor is the sum of its parts, for now" (Fried et al, 2021), which explains that the p-factor emerges as a result of a positive manifold, but it does not necessarily provide insights into the underlying mechanisms that generated the data.

      We thank the Reviewer for this comment, and added greater nuance into the discussion of the association to the p factor. We furthermore discuss some of the ongoing debate about the use of the p factor, and cite the recommended publication on P.27.

      “Other factors have also been suggested to impact the development of psychopathology, such as executive functioning deficits, earlier pubertal timing, negative life events (Brieant et al., 2021), maternal depression, or psychological factors (e.g., low effortful control, high neuroticism, negative affectivity). Inclusion of such data could also help to add further mechanistic insights into the rather synoptic proxy measure of the p factor itself (Fried et al., 2021), and to potentially assess shared and unique effects of the p factor vis-à-vis highly correlated measures of impulse control.”

    1. Author response:

      Reviewer #2 (Public Review):

      This is, to my knowledge, the most scalable method for phylogenetic placement that uses likelihoods. The tool has an inter- esting and innovative means of using gaps, which I haven’t seen before. In the validation the authors demonstrate superior performance to existing tools for taxonomic annotation (though there are questions about the setup of the validation as described below).

      The program is written in C with no library dependencies. This is great. However, I wasn’t able to try out the software because the linking failed on Debian 11, and the binary artifact made by the GitHub Actions pipeline was too recent for my GLIBC/kernel. It’d be nice to provide a binary for people stuck on older kernels (our cluster is still on Ubuntu 18.04). Also, would it be hard to publish your .zipped binaries as packages?

      We have provided a binary (and zipped package) that supports Ubuntu 18.04 in GitHub Actions ( https://github.com/lpipes/tronko/actions/runs/9947708087). This should facilitate the use of our software on older sys- tems like yours. We were not able to test the binary however, since GitHub did not seem to find any nodes with Ubuntu 18.04. It is important to note that Ubuntu 18.04 is deprecated. The latest version of Ubuntu is 24.04, and we recommend users to upgrade to newer, supported versions of their operating systems to benefit from the latest security updates and features.

      Thank you for publishing your source files for the validation on zenodo. Please provide a script that would enable the user to rerun the analysis using those files, either on zenodo or on GitHub somewhere.

      We have posted all datasets as well as scripts to Zenodo.

      The validations need further attention as follows.

      First, the authors have not chosen data sets that are not well-aligned with real-world use cases for this software, and as a re- sult, its applicability is difficult to determine. First, the leave-one-species-out experiment made use of COI gene sequences representing 253 species from the order Charadriiformes, which includes bird species such as gulls and terns. What is the reasoning for selecting this data set given the objective of demonstrating the utility of Tronko for large scale community profiling experiments which by their nature tend to include microorganisms as subjects? If the authors are interested in evaluating COI (or another gene target) as a marker for characterizing the composition of eukaryotic populations, is the heterogeneity and species distribution of bird species within order Charadriiformes comparable to what one would expect in populations of organisms that might actually be the target of a metagenomic analysis?

      Our reasoning for selecting Charadriiformes is that these species are often misidentified for each other and there is a heavy reliance on COI for their species identification. This choice allows us to demonstrate Tronko’s ability to handle difficult and realistic identification challenges. Additionally, we aimed to simulate a challenging dataset to effectively differentiate between the methods used, showcasing Tronko’s robustness. Including more distantly related bird species would have simplified the identification process, which would not serve our objective of demonstrating the utility of Tronko for dis- tinguishing closely related species. It is also important to note that all methods used the exact same reference database which is not always the case in other species assignment comparative studies.

      Furthermore, while our study uses bird species, the principles and techniques applied are broadly applicable to other taxa, including microorganisms. By selecting a datase tknown for its identification difficulties, we underscore Tronko’spotential utility in a wide range of taxonomic profiling scenarios, including those involving high heterogeneity and closely related species, such as in microbial communities.

      Second, It appears that experiments evaluating performance for 16S were limited to reclassification of sequencing data from mock communities described in two publications, Schirmer (2015, 49 bacteria and 10 archaea, all environmental), and Gohl (2016; 20 bacteria - this is the widely used commercial mock community from BEI, all well-known human pathogens or commensals). The authors performed a comparison with kraken2, metaphlan2, and MEGAN using both the default database for each as well as the same database used for Tronko (kudos for including the latter). This pair of experiments provide a reasonable high-level indication of Tronko’s performance relative to other tools, but the total number of organ- isms is very limited, and particularly limited with respect to the human microbiome. It is also important to point out that these mock communities are composed primarily of type strains and provide limited species-level heterogeneity. The per- formance of these classification tools on type strains may not be representative of what one would find in natural samples. Thus, the leave-one-individual-out and leave-one-species-out experiments would have been more useful and informative had they been applied to extended 16S data sets representing more ecologically realistic populations.

      We thank the reviewer for this comment and we have included both an additional bacterial mock community dataset from Lluch et al. (2015) and an additional leave-one-species-out experiment. We describe how this leave-one-species-out dataset was constructed in our previous response to ’Essential Revisions’ #1. We also added Figure 5, S5, and S6.

      Finally, the authors should describe the composition of the databases used for classification as well as the strategy (and toolchain) used to select reference sequences. What databases were the reference sequences drawn from and by what criteria? Were the reference databases designed to reflect the composition of the mock communities (and if so, are they limited to species in those communities, or are additional related species included), or have the authors constructed general pur- pose reference databases? How many representatives of each species were included (on average), and were there efforts to represent a diversity of strains for each species? The methods should include a section detailing the construction of the data sets: as illustrated in this very study, the choice of reference database influences the quality of classification results, and the authors should explain the process and design considerations for database construction.

      To construct our databases, we used CRUX (Curd et al., 2018). This is described in the Methods section under ’Custom 16S and COI Tronko-build reference database construction’. All missing outs tests were downsamples of these two databases. It is beyond the scope of the manuscript to discuss how CRUX works. Additionally, we added the following text:

      To compare the new method (Tronko) to previous methods, we constructed reference databases for COI and 16S for com- mon amplicon primer sets using CRUX (See Methods for exact primers used).

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Perez-Lopez et al. examine the function of the chemokine CCL28, which is expressed highly in mucosal tissues during infection, but its role during infection is poorly understood. They find that CCL28 promotes neutrophil accumulation in the intestines of mice infected with Salmonella and in the lungs of mice infected with Acinetobacter. They find that Ccl28-/- mice are highly susceptible to Salmonella infection, and highly resistant and protected from lethality following Acinetobacter infection. They find that neutrophils express the CCL28 receptors CCR3 and CCR10. CCR3 was pre-formed and intracellular and translocated to the cell surface following phagocytosis or inflammatory stimuli. They also find that CCL28 stimulation of CCR3 promoted neutrophil antimicrobial activity, ROS production, and NET formation, using a combination of primary mouse and human neutrophils for their studies. Overall, the authors' findings provide new and fundamental insight into the role of the CCL28:CCR3 chemokine:chemokine receptor pair in regulating neutrophil recruitment and effector function during infection with the intestinal pathogen Salmonella Typhimurium and the lung pathogen Acinetobacter baumanii.

      We would like to thank the reviewer for their positive assessment of our work and for providing us with constructive comments that have helped us to improve the manuscript.

      Reviewer #2 (Public Review):

      In this manuscript by Perez-Lopez et al., the authors investigate the role of the chemokine CCL28 during bacterial infections in mucosal tissues. This is a well-written study with exciting results. They show a role for CCL28 in promoting neutrophil accumulation to the guts of Salmonella-infected mice and to the lung of mice infected with Acinetobacter. Interestingly, the functional consequences of CCL28 deficiency differ between infections with the two different pathogens, with CCL28-deficiency increasing susceptibility to Salmonella, but increasing resistance to Acinetobacter. The underlying mechanistic reasons for this suggest roles for CCL28 in enhanced neutrophil antimicrobial activity, production of reactive oxygen species, and formation of extracellular traps. However, additional experiments are required to shore up these mechanisms, including addressing the role of other CCL28-dependent cell types and further characterization of neutrophils from CCL28-deficient mice.

      We would like to thank the reviewer for the positive assessment of our work and for providing us with constructive comments that have helped us to improve the manuscript.

      Reviewer #3 (Public Review):

      The manuscript by Perez-Lopez and colleagues uses a combination of in vivo studies using knockout mice and elegant in vitro studies to explore the role of the chemokine CCL28 during bacterial infection on mucosal surfaces. Using the streptomycin model of Salmonella Typhimurium (S. Tm) infection, the authors demonstrate that CCL28 is required for neutrophil influx in the intestinal mucosa to control pathogen burden both locally and systemically. Interestingly, CCL28 plays the opposite role in a model lung infection by Acinetobacter baumanii, as Ccl28-/- mice are protected from Acinetobacter infection. Authors suggest that the mechanism by which CCL28 plays a role during bacterial infection is due to its role in modulating neutrophil recruitment and function.

      We would like to thank the reviewer for the positive assessment of our work and for providing us with constructive comments that have helped us to improve the manuscript.

      The major strengths of the manuscript are:

      The novelty of the findings that are described in the manuscript. The role of the chemokine CCL28 in modulating neutrophil function and recruitment in mucosal surfaces is intriguing and novel.

      Authors use Ccl28-/- mice in their studies, a mouse strain that has only recently been available. To assess the impact of CCL28 on mucosal surfaces during pathogen-induced inflammation, the authors choose not one but two models of bacterial infection (S. Tm and A. baumanii). This approach increases the rigor and impact of the data presented.

      Authors combine the elegant in vivo studies using Ccl28 -/- with in vitro experiments that explore the mechanisms by which CCL28 affects neutrophil function.

      The major weaknesses of the manuscript in its present form are:

      Authors use different time points in the S. Tm model to characterize the influx of immune cells and pathology. They do not provide a clear justification as to why distinct time points were chosen for their analysis.

      The reviewer raises a good point. As discussed in the detailed response to the reviewers, we have now generated extensive results at different time points and included these in the revised manuscript.

      Authors provide puzzling data that Ccl28-/- mice have the same numbers of CCR3 and CCR10- expressing neutrophils in the mucosa during infection. It is unclear why the lack of CCL28 expression would not affect the recruitment of neutrophils that express the ligands (CCR3 and CCR10) for this chemokine. Thus, these results need to be better explained.

      As discussed in the detailed response to the reviewers, we clarified that Ccl28-/- mice have reduced numbers of neutrophils in the mucosa during infection, but the percentage of CCR3+ and CCR10+ neutrophils does not change. We provide additional discussion of this point in the manuscript and in the response to the reviewers.

      The in vitro studies focus primarily on characterizing how CCL28 affects the function of neutrophils in response to S. Tm infection. There is a lack of data to demonstrate whether Acinetobacter affects CCR3 and CCR10 expression and recruitment to the cell surface and whether CCL28 plays any role in this process.

      We agree and have performed additional studies with Acinetobacter and CCL28, which we discuss in greater detail below in the response to the reviewers.

    1. Author response:

      We appreciate the time of the reviewers and their detailed comments, which will help to improve the manuscript.

      We are sorry that at least one reviewer seems to have had the impression that we have conflated issues about gonadal and non-gonadal sex phenotypes. This referee suggests that we should use Sharpe et al. (2023) to develop our concepts. However, what is discussed in Sharpe et al. was already the guiding principle for our study (without knowing this paper before). In our paper, we introduce the gonadal binary sex (which is self-evidently also the basis for creating the dataset in the first place, because we needed to separate males from females) and go then on to the question of (adult) sex phenotypes for the rest of the paper. The gonadal data are included only as comparison for contrasting the patterns in the non-gonadal tissues.

      Our study presents the largest systematic dataset so far on the evolution of sex-biased gene expression. It is also the first that explores the patterns of individual variation in sex-biased gene expression and the SBI is an entirely new procedure to directly visualize these variance patterns in an intuitive way (note that the relative position of the distributions along the X-axis is indeed not relevant). The results are actually quite nuanced (e.g. the rather dynamv changes seen in mouse kidney and liver comparisons) and go certainly beyond what would have been predictable based on the current literature.

      Also, we should like to point out that our study contradicts recent conclusions that were published in high profile journals, that had suggested that a substantial set of sex-biased genes has conserved functions between humans and mice and that mice can therefore be informative for gender-specific medicine studies. Our data suggest that that only a very small set of genes are conserved in their sex-biased expression. These are epigenetic regulator genes and it will therefore be interesting in the future to focus on their roles in generating the differences between sexual phenotypes in given species.

      We will be happy to use the referee comments to clarify all of these points in a revised version. But we do not think that our "evidence is incomplete" and that there are several "overstated key conclusions". We have used all canonical statistical analyses that are typically used in papers of sex-biased gene expression, as acknowledged by reviewers 1 and 2. The additional statistical analyses that are requested are not within the scope of such papers, but could be subject to separate general studies, independent of the sex-bias analysis (e.g. the role of highly expressed genes versus low expressed genes, or the analysis of the fraction of neutrally evolving loci).

      Finally, it is unclear why the overall rating of the paper is at the lowest possible category ("useful study"), given that it adds a substantial amount of data and new insights into the exploration of the non-binary nature of sexual phenotypes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given that KRAS inhibition approaches are a relatively new innovation and that resistance is now being observed to such therapies in patients with NSCLC, investigation of combination therapies is valuable. The manuscript furthers our understanding of combination therapy for KRAS mutant non-small cell lung cancer by providing evidence that combined inhibition of ULK1/2 (and therefore autophagy) and KRAS can inhibit KRAS-mutant lung cancer growth. The manuscript will be of interest to the lung cancer community but also to researchers in other cancer types where KRAS inhibition is relevant.

      Strengths:

      The manuscript combines cell line, cell line-derived xenograft, and genetically-engineered mouse model data to provide solid evidence for the proposed combination therapy.  The manuscript is well written, and experiments are broadly well performed and presented.

      We thank Reviewer #1 (R1) for the generally favorable review of our manuscript, and also for the more detailed critique that identifies potential weaknesses in the research, which we address on a point-by-point basis below. 

      Weaknesses:

      With 3-4 mice per group in many experiments, experimental power is a concern and some comparisons (e.g. mono vs combination therapy) seem to be underpowered to detect a difference. Both male and female mice are used in experiments which may increase variability.

      We thank R1 for pointing out concerns regarding statistical power in our various mouse models of NSCLC experiments, and agree that more mice per group would certainly increase statistical power.  However, there are certain logistical considerations that impact the generation of cohorts of experimental KrasLSL-G12C mice.  Because mice homozygous for the KrasLSL-G12C allele display embryonic lethality, we are required to generate experimental mice by crossing heterozygous male and female KrasLSL-G12C mice.  Although 66% of the progeny of such crosses are predicted to be KrasLSL-G12C/+, experience tells us that we only obtain ~40-50% heterozygous KrasLSL-G12C/+ mice with litter sizes around 6-8 mice from such crosses.  Therefore, there are usually only about 4 heterozygous KrasLSL-G12C mice per litter, which presents a substantial challenge in generating larger cohorts of age-matched mice suitable for experiments, especially under conditions where we wish to euthanize mice at multiple time points for analysis.  For the GEM model experiments, Figure 3B is the only experiment that has n=3.  All other experiments contain 4-6 mice per experimental condition.  We rationalized using both male and female mice because both human males and females have high lung cancer rates.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Ghazi et reported that inhibition of KRASG12C signaling increases autophagy in KRASG12C-expressing lung cancer cells. Moreover, the combination of DCC 3116, a selective ULK1/2 inhibitor, plus sotorasib displays cooperative/synergistic suppression of human KRASG12C-driven lung cancer cell proliferation in vitro and tumor growth in vivo. Additionally, in genetically engineered mouse models of KRASG12C-driven NSCLC, inhibition of either KRASG12C or ULK1/2 decreases tumor burden and increases mouse survival. Additionally, this study found that LKB1 deficiency diminishes the sensitivity of KRASG12C/LKB1Null-driven lung cancer to the combination treatment, perhaps through the emergence of mixed adeno/squamous cell carcinomas and mucinous adenocarcinomas.

      Strengths:

      Both human cancer cells and mouse models were employed in this study to illustrate that inhibiting ULK1/2 could enhance the responsiveness of KRASG12C lung cancer to sotorasib. This research holds translational importance.

      We thank Reviewer #2 (R2) for the generally favorable review of our manuscript, and also for the more detailed critique that identifies potential weaknesses in the research, which we address on a point-by-point basis below. 

      Weaknesses:

      Additional validation of certain data is necessary.

      (1) mCherry-EGFP-LC3 reporter was used to assess autophagy flux in Figure 1A. Please explain how autophagy status (high, medium, and low) was defined. It's also suggested to show WB of LC3 processing in different treatments as in Figure 1A at 48 hours.

      We thank the reviewer for this comment and agree that a more thorough description of how autophagy status is assessed using the Fluorescent Autophagy Reporter (FAR) would benefit the readers of our manuscript.  Cells engineered to express the FAR are analyzed by flow cytometry in which we defined autophagy status by gating viable (based Sytox Blue staining), DMSO-treated control cells into three bins based on the ratio of EGFP:mCherry fluorescence.  We gate all live cells into the 33% highest EGFP-positive cells (autophagy low) and the 33% highest mCherry-positive cells (autophagy high), and therefore, the proportion in the middle is also approximately 33% and considered the medium autophagy status.  Again, these gates are based entirely on the DMSO-treated control cells, and all other treatments within the experiment are compared to settings on these gates.  In response to a specific manipulation (sotorasib, trametinib, DCC-3116 etc) we assess how the specific treatment changes the percentages of cells in each of the pre-specified gates to assess increased autophagy (decreased EGFP:mCherry ratio) or decreased autophagy (increased increased EGFP:mCherry ratio). 

      Although LC3 processing and/or the expression of p62SQSTM1 are used by others as markers of autophagy, there is much debate in the literature as to how reliable immunoblotting analysis of LC3 processing or p62SQSTM1 expression are as measures of autophagy.  Certainly, in our hands, we find that the Fluorescent Autophagy Reporter is a much more sensitive measure of changes in autophagy in various different cancer cell lines as we have described in previous papers (Kinsey et al., PMID: 30833748, Truong et al., PMID: 32933997 and Silvis & Silva et al., PMID: 36719686).  Furthermore, in the omnibus publication that describes techniques for measuring autophagy (Klionsky et al., PMID: 33634751) the use of the FAR (or similarly configured reporters) is regarded as the gold standard for measuring autophagy status in cells.  We have amended the Materials & Methods section of our manuscript to better describe the use of the FAR in measuring autophagy. 

      (2) For Figures 1J, K, and L, please provide immunohistochemistry (IHC) images demonstrating RAS downstream signaling blockade by sotorasib and autophagy blockade by DCC 3116 in tumors.

      We thank the reviewer for the comment and have probed the tumors from the xenograft experiments in Figures 1J, K, and L for pERK1/2 and p62SQSTM1 to determine the biochemical activity of sotorasib or DCC-3116, respectively and have provided representative images below. We observed the expected decrease in pERK and p62 signal after sotorasib treatment in all three xenografted cell lines. We did observe the expected accumulation of p62 in the DCC-3116 treated tumors from the NCI-H2122 and NCI-H358 cell lines. There appears to be no difference between the vehicle and DCC-3116 treated tumors in the NCI-H358 cell line-derived tumors as detected by IHC.

      Author response image 1.

      (3) Given that both DCC 3116 and ULK1K46N exhibit the ability to inhibit autophagy and synergize with sotorasib in inhibiting cell proliferation, in addition to demonstrating decreased levels of pATG13 via ELISA assay, please include Western blot analyses of LC3 or p62 to confirm the blockade of autophagy by DCC 3116 and ULK1K46N in Figure 1 & Figure 2.

      We appreciate the reviewer's comment and have performed an immunoblot analysis of cells treated with DCC-3116 or expressing ULK1K46N and probed for p62SQSTM1 and LC3 expression.  We did observe the expected accumulation of p62 SQSTM1 in NCI-H2122 (ULK1K46N) cells treated with 1ug/ml doxycycline to induce expression of ULK1K46N compared to DMSO treatment.  Additionally, we treated the human cell lines from Figure 1 with sotorasib and/or DCC-3116 and tested for p62SQSTM1 expression after 48 hours of treatment. In the human cell lines NCI-H2122 and NCI-H358, there was a decrease in the p62 signal with increasing doses of sotorasib, as expected. There was no detectable change in p62 levels in the Calu-1 cells by immunoblot. For LC3-I/LC3-II, there was only one detectable band in the NCI-H2122 cells, which makes it difficult to interpret the results and further emphasizes why we use the fluorescent autophagy reporter which is more sensitive than immunoblotting. There is no detectable change in LC3-I/LC3-II in the Calu-1 cells treated with increasing doses of sotorasib, but the expected decrease in LC3-I is observed with sotorasib treatment in the NCI-H358 cells.

      Author response image 2.

      (4) Since adenocarcinomas, adenosquamous carcinomas (ASC), and mucinous adenocarcinomas were detected in KL lung tumors, please conduct immunohistochemistry (IHC) to detect these tumors, including markers such as p63, SOX2, Katrine 5.

      We have included IHC analysis of the adenosquamous carcinomas for the markers p63, SOX2, and Keratin 5 from the KL mouse in Figure 3 and the ASC tumors in Supplemental Figure 4, and thank the reviewer for this excellent suggestion. The straining for these markers is below. Of note, we tried two different SOX2 antibodies (cell signaling technologies #14962 and cell signaling technologies # 3728) and could not detect any staining in any section.

      Author response image 3.

      (5) Please provide the sample size (n) for each treatment group in the survival study (Figure 4E). It appears that all mice were sacrificed for tumor burden analysis in Figure 4F. However, there doesn't seem to be a significant difference among the treatment groups in Figure 4F, which contrasts with the survival analysis in Figure 4E. It is suggested to increase the sample size in each treatment group to reduce variation.

      We have updated Figure 4E to indicate sample size for each treatment group and thank the reviewer for this suggestion.  Any mice that remained on study through the entire 8-week treatment regimen were sacrificed after the last day of treatment (Day 56).  Figure 4F indicates analysis of total tumor burden in all mice that remained on treatment for the full 8 weeks and mice that reached euthanasia criteria before the end of the 8-week treatment.  Therefore, it is important to note that the mice in Figure 4F were not all euthanized on the same day.  There is no statistically significant difference between the 3 treatment groups (sotorasib, DCC-3116, combination).  This may be due to a lower sample size as well as ending the treatment at 8 weeks as opposed to continuing the treatment for a longer period of time.  Although we agree that increasing the sample size would benefit the study, due to how long the GEMM model experiments take (12-16 weeks of breeding, 6 weeks for the mice to reach adulthood, 10 weeks of tumor formation post-initiation, 8 weeks of treatment= ~40 weeks) we would respectfully submit that the analysis of additional mice is outside the scope of the current revised manuscript.

      (6) In KP mice (Figure 5), it seems that a single treatment alone is sufficient to inhibit established KP lung tumor growth. Combination treatment does not further enhance anti-tumor efficacy. Therefore, this result doesn't support the conclusion generated from human cancer cell lines. Please discuss.

      We thank the reviewer for this observation.  Indeed, KP lung tumors were sensitive to single agent DCC-3116 treatment, which is reflected in the tumor burden analysis.  This was somewhat surprising to us as we have not previously detected much anti-tumor activity using 4-amino-quinoloines (chloroquine or hydroxychloroquine) or other autophagy inhibitors.  It should be noted however that the KRASG12C/TP53R175H NSCLC model has a very low tumor burden overall (~4% in vehicle-treated mice).  Additionally, our microCT imager cannot detect AAH and small tumors at the settings/resolution used.  Therefore, we were limited in our ability to detect small tumors or hyperplasia by microCT imaging.  Although there was a decrease in overall tumor burden with single agent DCC-3116 treatment, we could not demonstrate using microCT imaging that KRASG12C/TP53R175H lung tumors were actually regressing with single agent DCC-3116 treatment.  The larger tumors that were detected appeared to show a cytostatic effect (i.e. no or slow growth) with DCC-3116 monotherapy.  This may reflect our inability to detect regression of AAH or small tumors with the microCT.  In all human cell lines tested, the only cell line that responded to single agent DCC-3116 treatment was NCI-H358 cells, which do have a complete heterozygous loss of the TRP53 gene and lack TP53 protein.  However, other cells that also have a loss of expression of TP53 expression (Calu-1) are insensitive to single-agent DCC-3116 treatment. Due to the low mutational burden of the KP mouse model compared to human NSCLC cell lines driven by mutationally-activated KRASG12C and the loss of TP53 function, it is difficult to directly compare GEM models to the human cell line models.  Most of the human cell lines have alterations in other genes that are not altered in the KP mouse model which could affect the sensitivity of treatment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure legends are currently not adequate - information about the number and nature of replicates, stats, and definitions of the labelling used for stats should be added throughout. In Figure 5B, only two lines of four are labelled with * or ns.

      We thank the reviewer for this comment and have included more details in the figure legends that describe replicates, statistical analysis and definitions of labeling.  We also note that the methods section has a detailed description of the statistical analysis used.

      (2) What statistical test is performed on Figure 5E to get a p < 0.05 between the vehicle and DCC group?

      We performed a one-way ANOVA for all statistical analyses with more than 2 experiential groups. We thank the reviewer for pointing out this typo. These data points (vehicle vs. DCC-3116) are not statistically significant, which has been revised in the figure.

      (3) The manuscript figures would be improved by the use of a colourblind-friendly palette.

      We have previously published multiple manuscripts using this color scheme for the fluorescent autophagy reporter experiments and chose to use red and green as the reporter uses EGFP and mCherry.  We wanted to keep this color scheme consistent across our publications and would prefer not to change the colors.  However, we agree with the reviewer that the data should be accessible to all people and, therefore, have updated these graphs to include slashes over the red color to ease in telling the differences between the red and green colors.  Thank you to the reviewer for this excellent suggestion.

      (4) The manuscript should be fully checked for mouse (sentence case) and human (caps) gene (italics) and protein (non-italics).

      In this manuscript we are using the nomenclatures approved by the HUGO Gene Nomenclature Committee (https://en.wikipedia.org/wiki/HUGO_Gene_Nomenclature_Committee) in which:

      Human genes are written as KRAS, TP53 etc i.e. ITALICIZED CAPS

      Mouse genes are written as Kras, Trp53 etc:  i.e. Italicized and sentence case

      Human and mouse proteins are written as KRAS, TP53 etc:  i.e. NON-ITALICIZED CAPS

      In response to the reviewer’s suggestion, we have gone through the manuscript to check for this and make any appropriate changes.  Of note, we intentionally refer to the mouse protein changes as KRASG12C/LKB1null or KRASG12C/TP53R172H (capitalized), as this references the protein change and not the nucleotide change that occurs in the gene.

      (5) Adenosquamous is the correct term for the disease.  In parts, it's referred to as adeno/squamous or adeno-squamous.  The abbreviation ADC is also defined many times.

      Thank you to the reviewer for this comment.  We have corrected the manuscript text to only use adenosquamous and only define ADC in the first instance.

      (6) Line 434 - "as previously described" but no reference.

      Typos:

      (1) Line 117 – either

      (2) Line 314 – synergistic

      (3) Line 317 – therefore

      (4) Line 502 – medium

      We thank the reviewer for pointing out these typos and have modified the text appropriately.

      Reviewer #2 (Recommendations For The Authors):

      (1) The statement on Page 4, Lines 119-120, lacks clarity: 'Furthermore, LKB1 silencing diminishes the sensitivity of KRASG12C/LKB1Null-driven lung cancer perhaps through the emergence of mixed adeno/squamous cell carcinomas and mucinous adenocarcinomas.  It is unclear whether this refers to the sensitivity to the combination treatment or to the KRASc inhibitor alone.

      We thank the reviewer for this comment and agree that the statement lacks clarity.  The intent of this statement was to refer to both single agent sotorasib treatment as well as the combination with DCC-3116.  

      (2) Page 5 Line 147 "KRASG12X ". Please correct this typo.

      We thank the reviewer for this comment, but this is not a typo. We intended for this line to state KRASG12X to refer to cell lines with any KRASG12 alteration, e.g KRASG12D, KRASG12C, KRASG12S, KRASG12R etc.  

      (3) The color of the dots in Figure 5B labeling does not match the dots in the graph.

      For all bar graphs in the manuscript, the dots representing individual mice are black, and the bar itself is color-coded based on treatment type. The dots in Figure 5B follow this pattern and are intended to be this way.

      (4) Figure 5C depicts lung weight rather than tumor growth, contrary to the text description "regression of pre-existing lung tumors was detected by microCT scanning (Figure 5C, Figure S5)".

      Figure 5C does not depict lung weight but the percent body weight change in treated mice, described in the figure legend.  We thank the reviewer for pointing this out because we referenced the wrong panel in the text.  The figures referenced should be Figure 5B, Figure S5.  We have corrected this in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors used structural and biophysical methods to provide insight into Parkin regulation. The breadth of data supporting their findings was impressive and generally well-orchestrated. Still, the impact of their results builds on recent structural studies and the stated impact is based on these prior works.

      Strengths:

      (1) After reading through the paper, the major findings are:

      - RING2 and pUbl compete for binding to RING0.

      - Parkin can dimerize.

      - ACT plays an important role in enzyme kinetics.

      (2) The use of molecular scissors in their construct represents a creative approach to examining inter-domain interactions.

      (3) From my assessment, the experiments are well-conceived and executed.

      We thank the reviewer for their positive remark and extremely helpful suggestions.

      Weaknesses:

      The manuscript, as written, is NOT for a general audience. Admittedly, I am not an expert on Parkin structure and function, but I had to do a lot of homework to try to understand the underlying rationale and impact. This reflects, I think, that the work generally represents an incremental advance on recent structural findings.

      To this point, it is hard to understand the impact of this work without more information highlighting the novelty. There are several structures of Parkin in various auto-inhibited states, and it was hard to delineate how this is different.

      For the sake of the general audience, we have included all the details of Parkin structures and conformations seen (Extended Fig. 1). The structures in the present study are to validate the biophysical/biochemical experiments, highlighting key findings. For example, we solved the phospho-Parkin (complex with pUb) structure after treatment with 3C protease (Fig. 2C), which washes off the pUbl-linker, as shown in Fig 2B. The structure of the pUbl-linker depleted phospho-Parkin-pUb complex showed that RING2 returned to the closed state (Fig. 2C), which is confirmation of the SEC assay in Fig. 2B. Similarly, the structure of the pUbl-linker depleted phospho-Parkin R163D/K211N-pUb complex (Fig. 3C), was done to validate the SEC data showing displacement of pUbl-linker is independent of pUbl interaction with the basic patch on RING0 (Fig. 3B). In addition, the latter structure also revealed a new donor ubiquitin binding pocket in the linker (connecting REP and RING2) region of Parkin (Fig. 9). Similarly, trans-complex structure of phospho-Parkin (Fig. 4D) was done to validate the biophysical data (Fig. 4A-C, Fig. 5A-D) showing trans-complex between phospho-Parkin and native Parkin. The latter also confirmed that the trans-complex was mediated by interactions between pUbl and the basic patch on RING0 (Fig. 4D). Furthermore, we noticed that the ACT region was disordered in the trans-complex between phospho-Parkin (1-140 + 141-382 + pUb) (Fig. 8A) which had ACT from the trans molecule, indicating ACT might be present in the cis molecule. The latter was validated from the structure of trans-complex between phospho-Parkin with cis ACT (1-76 + 77-382 + pUb) (Fig. 8C), showing the ordered ACT region. The structural finding was further validated by biochemical assays (Fig. 8 D-F, Extended Data Fig. 9C-E).

      The structure of TEV-treated R0RBR (TEV) (Extended Data Fig. 4C) was done to ensure that the inclusion of TEV and treatment with TEV protease did not perturb Parkin folding, an important control for our biophysical experiments.

      As noted, I appreciated the use of protease sites in the fusion protein construct. It is unclear how the loop region might affect the protein structure and function. The authors worked to demonstrate that this did not introduce artifacts, but the biological context is missing.

      We thank the reviewer for appreciating the use of protease sites in the fusion protein construct.  Protease sites were used to overcome the competing mode of binding that makes interactions very transient and beyond the detection limit of methods such as ITC or SEC. While these interactions are quite transient in nature, they could still be useful for the activation of various Parkin isoforms that lack either the Ubl domain or RING2 domain (Extended Data Fig. 6, Fig. 10). Also, our Parkin localization assays also suggest an important role of these interactions in the recruitment of Parkin molecules to the damaged mitochondria (Fig. 6).

      While it is likely that the binding is competitive between the Ubl and RING2 domains, the data is not quantitative. Is it known whether the folding of the distinct domains is independent? Or are there interactions that alter folding? It seems plausible that conformational rearrangements may invoke an orientation of domains that would be incompatible. The biological context for the importance of this interaction was not clear to me.

      This is a great point. In the revised manuscript, we have included quantitative data between phospho-Parkin and untethered ∆Ubl-Parkin (TEV) (Fig. 5B) showing similar interactions using phospho-Parkin K211N and untethered ∆Ubl-Parkin (TEV) (Fig. 4B). Folding of Ubl domain or various combinations of RING domains lacking Ubl seems okay. Also, folding of the RING2 domain on its own appears to be fine. However, human Parkin lacking the RING2 domain seems to have some folding issues, majorly due to exposure of hydrophobic pocket on RING0, also suggested by previous efforts (Gladkova et al.ref. 24, Sauve et al. ref. 29).  The latter could be overcome by co-expression of RING2 lacking Parkin construct with PINK1 (Sauve et al. ref. 29) as phospho-Ubl binds on the same hydrophobic pocket on RING0 where RING2 binds. A drastic reduction in the melting temperature of phospho-Parkin (Gladkova et al.ref. 24), very likely due to exposure of hydrophobic surface between RING0 and RING2, correlates with the folding issues of RING0 exposed human Parkin constructs.

      From the biological context, the competing nature between phospho-Ubl and RING2 domains could block the non-specific interaction of phosphorylated-ubiquitin-like proteins (phospho-Ub or phospho-NEDD8) with RING0 (Lenka et al. ref. 33), during Parkin activation. 

      (5) What is the rationale for mutating Lys211 to Asn? Were other mutations tried? Glu? Ala? Just missing the rationale. I think this may have been identified previously in the field, but not clear what this mutation represents biologically.

      Lys211Asn is a Parkinson’s disease mutation; therefore, we decided to use the same mutation for biophysical studies.  

      I was confused about how the phospho-proteins were generated. After looking through the methods, there appear to be phosphorylation experiments, but it is unclear what the efficiency was for each protein (i.e. what % gets modified). In the text, the authors refer to phospho-Parkin (T270R, C431A), but not clear how these mutations might influence this process. I gather that these are catalytically inactive, but it is unclear to me how this is catalyzing the ubiquitination in the assay.

      This is an excellent question. Because different phosphorylation statuses would affect the analysis, we ensured complete phosphorylation status using Phos-Tag SDS-PAGE, as shown below.

      Author response image 1.

      Our biophysical experiments in Fig. 5C show that trans complex formation is mediated by interactions between the basic patch (comprising K161, R163, K211) on RING0 and phospho-Ubl domain in trans. These interactions result in the displacement of RING2 (Fig. 5C). Parkin activation is mediated by displacement of RING2 and exposure of catalytic C431 on RING2. While phospho-Parkin T270R/C431A is catalytically dead, the phospho-Ubl domain of phospho-Parkin T270R/C431would bind to the basic patch on RING0 of WT-Parkin resulting in activation of WT-Parkin as shown in Fig. 5E. A schematic figure is shown below to explain the same.

      Author response image 2.

      (7) The authors note that "ACT can be complemented in trans; however, it is more efficient in cis", but it is unclear whether both would be important or if the favored interaction is dominant in a biological context.

      First, this is an excellent question about the biological context of ACT and needs further exploration. While due to the flexible nature of ACT, it can be complemented both in cis and trans, we can only speculate cis interactions between ACT and RING0 could be more relevant from the biological context as during protein synthesis and folding, ACT would be translated before RING2, and thus ACT would occupy the small hydrophobic patch on RING0 in cis. Unpublished data shows the replacement of the ACT region by Biogen compounds to activate Parkin (https://doi.org/10.21203/rs.3.rs-4119143/v1). The latter finding further suggests the flexibility in this region.        

      (8) The authors repeatedly note that this study could aid in the development of small-molecule regulators against Parkin to treat PD, but this is a long way off. And it is not clear from their manuscript how this would be achieved. As stated, this is conjecture.

      As suggested by this reviewer, we have removed this point in the revised manuscript.

      Reviewer #2 (Public Review):

      This manuscript uses biochemistry and X-ray crystallography to further probe the molecular mechanism of Parkin regulation and activation. Using a construct that incorporates cleavage sites between different Parkin domains to increase the local concentration of specific domains (i.e., molecular scissors), the authors suggest that competitive binding between the p-Ubl and RING2 domains for the RING0 domain regulates Parkin activity. Further, they demonstrate that this competition can occur in trans, with a p-Ubl domain of one Parkin molecule binding the RING0 domain of a second monomer, thus activating the catalytic RING1 domain. In addition, they suggest that the ACT domain can similarly bind and activate Parkin in trans, albeit at a lower efficiency than that observed for p-Ubl. The authors also suggest from crystal structure analysis and some biochemical experiments that the linker region between RING2 and repressor elements interacts with the donor ubiquitin to enhance Parkin activity.<br /> Ultimately this manuscript challenges previous work suggesting that the p-Ubl domain does not bind to the Parkin core in the mechanism of Parkin activation. The use of the 'molecular scissors' approach to probe these effects is an interesting approach to probe this type of competitive binding. However, there are issues with the experimental approach manuscript that detract from the overall quality and potential impact of the work.

      We thank the reviewer for their positive remark and constructive suggestions.

      The competitive binding between p-Ubl and RING2 domains for the Parkin core could have been better defined using biophysical and biochemical approaches that explicitly define the relative affinities that dictate these interactions. A better understanding of these affinities could provide more insight into the relative bindings of these domains, especially as it relates to the in trans interactions.

      This is an excellent point regarding the relative affinities of pUbl and RING2 for the Parkin core (lacking Ubl and RING2). While we could purify p-Ubl, we failed to purify human Parkin (lacking RING2 and phospho-Ubl). The latter folding issues were likely due to the exposure of a highly hydrophobic surface on RING0 (as shown below) in the absence of pUbl and RING2 in the R0RB construct. Also, RING2 with an exposed hydrophobic surface would be prone to folding issues, which is not suitable for affinity measurements. A drastic reduction in the melting temperature of phospho-Parkin (Gladkova et al.ref. 24) also highlights the importance of a hydrophobic surface between RING0 and RING2 on Parkin folding/stability. A separate study would be required to try these Parkin constructs from different species and ensure proper folding before using them for affinity measurements.

      Author response image 3.

      I also have concerns about the results of using molecular scissors to 'increase local concentrations' and allow for binding to be observed. These experiments are done primarily using proteolytic cleavage of different domains followed by size exclusion chromatography. ITC experiments suggest that the binding constants for these interactions are in the µM range, although these experiments are problematic as the authors indicate in the text that protein precipitation was observed during these experiments. This type of binding could easily be measured in other assays. My issue relates to the ability of a protein complex (comprising the core and cleaved domains) with a Kd of 1 µM to be maintained in an SEC experiment. The off-rates for these complexes must be exceeding slow, which doesn't really correspond to the low µM binding constants discussed in the text. How do the authors explain this? What is driving the Koff to levels sufficiently slow to prevent dissociation by SEC? Considering that the authors are challenging previous work describing the lack of binding between the p-Ubl domain and the core, these issues should be better resolved in this current manuscript. Further, it's important to have a more detailed understanding of relative affinities when considering the functional implications of this competition in the context of full-length Parkin. Similar comments could be made about the ACT experiments described in the text.

      This is a great point. In the revised manuscript, we repeated ITC measurements in a different buffer system, which gave nice ITC data. In the revised manuscript, we have also performed ITC measurements using native phospho-Parkin. Phospho-Parkin and untethered ∆Ubl-Parkin (TEV) (Fig. 5B) show similar affinities as seen between phospho-Parkin K211N and untethered ∆Ubl-Parkin (TEV) (Fig. 4B). However, Kd values were consistent in the range of 1.0 ± 0.4 µM which could not address the reviewer’s point regarding slow off-rate. The crystal structure of the trans-complex of phospho-Parkin shows several hydrophobic and ionic interactions between p-Ubl and Parkin core, suggesting a strong interaction and, thus, justifying the co-elution on SEC. Additionally, ITC measurements between E2-Ub and P-Parkin-pUb show similar affinity (Kd = 0.9 ± 0.2 µM) (Kumar et al., 2015, EMBO J.), and yet they co-elute on SEC (Kumar et al., 2015, EMBO J.).

      Ultimately, this work does suggest additional insights into the mechanism of Parkin activation that could contribute to the field. There is a lot of information included in this manuscript, giving it breadth, albeit at the cost of depth for the study of specific interactions. Further, I felt that the authors oversold some of their data in the text, and I'd recommend being a bit more careful when claiming an experiment 'confirms' a specific model. In many cases, there are other models that could explain similar results. For example, in Figure 1C, the authors state that their crystal structure 'confirms' that "RING2 is transiently displaced from the RING0 domain and returns to its original position after washing off the p-Ubl linker". However, it isn't clear to me that RING2 ever dissociated when prepared this way. While there are issues with the work that I feel should be further addressed with additional experiments, there are interesting mechanistic details suggested by this work that could improve our understanding of Parkin activation. However, the full impact of this work won't be fully appreciated until there is a more thorough understanding of the regulation and competitive binding between p-Ubl and RIGN2 to RORB both in cis and in trans.

      We thank the reviewer for their positive comment. In the revised manuscript, we have included the reviewer’s suggestion. The conformational changes in phospho-Parkin were established from the SEC assay (Fig. 2A and Fig. 2B), which show displacement/association of phospho-Ubl or RING2 after treatment of phospho-Parkin with 3C and TEV, respectively. For crystallization, we first phosphorylated Parkin, where RING2 is displaced due to phospho-Ubl (as shown in SEC), followed by treatment with 3C protease, which led to pUbl wash-off. The Parkin core separated from phospho-Ubl on SEC was used for crystallization and structure determination in Fig. 2C, where RING2 returned to the RING0 pocket, which confirms SEC data (Fig. 2B).

      Reviewer #3 (Public Review):

      Summary:

      In their manuscript "Additional feedforward mechanism of Parkin activation via binding of phospho-UBL and RING0 in trans", Lenka et al present data that could suggest an "in trans" model of Parkin ubiquitination activity. Parkin is an intensely studied E3 ligase implicated in mitophagy, whereby missense mutations to the PARK2 gene are known to cause autosomal recessive juvenile parkinsonism. From a mechanistic point of view, Parkin is extremely complex. Its activity is tightly controlled by several modes of auto-inhibition that must be released by queues of mitochondrial damage. While the general overview of Parkin activation has been mapped out in recent years, several details have remained murky. In particular, whether Parkin dimerizes as part of its feed-forward signaling mechanism, and whether said dimerization can facilitate ligase activation, has remained unclear. Here, Lenka et al. use various truncation mutants of Parkin in an attempt to understand the likelihood of dimerization (in support of an "in trans" model for catalysis).

      Strengths:

      The results are bolstered by several distinct approaches including analytical SEC with cleavable Parkin constructs, ITC interaction studies, ubiquitination assays, protein crystallography, and cellular localization studies.

      We thank the reviewer for their positive remark.

      Weaknesses:

      As presented, however, the storyline is very confusing to follow and several lines of experimentation felt like distractions from the primary message. Furthermore, many experiments could only indirectly support the author's conclusions, and therefore the final picture of what new features can be firmly added to the model of Parkin activation and function is unclear.

      We thank the reviewer for their constructive criticism, which has helped us to improve the quality of this manuscript.

      Major concerns:

      (1) This manuscript solves numerous crystal structures of various Parkin components to help support their idea of in trans transfer. The way these structures are presented more resemble models and it is unclear from the figures that these are new complexes solved in this work, and what new insights can be gleaned from them.

      The structures in the present study are to validate the biophysical/biochemical experiments highlighting key findings. For example, we solved the phospho-Parkin (complex with pUb) structure after treatment with 3C protease (Fig. 2C), which washes off the pUbl-linker, as shown in Fig. 2B. The structure of pUbl-linker depleted phospho-Parkin-pUb complex showed that RING2 returned to the closed state (Fig. 2C), which is confirmation of the SEC assay in Fig. 2B. Similarly, the structure of the pUbl-linker depleted phospho-Parkin R163D/K211N-pUb complex (Fig. 3C), was done to validate the SEC data showing displacement of pUbl-linker is independent of pUbl interaction with the basic patch on RING0 (Fig. 3B). In addition, the latter structure also revealed a new donor ubiquitin binding pocket in the linker (connecting REP and RING2) region of Parkin (Fig. 9). Similarly, trans-complex structure of phospho-Parkin (Fig. 4D) was done to validate the biophysical data (Fig. 4A-C, Fig. 5A-D) showing trans-complex between phospho-Parkin and native Parkin. The latter also confirmed that the trans-complex was mediated by interactions between pUbl and the basic patch on RING0 (Fig. 4D). Furthermore, we noticed that the ACT region was disordered in the trans-complex between phospho-Parkin (1-140 + 141-382 + pUb) (Fig. 8A) which had ACT from the trans molecule, indicating ACT might be present in the cis molecule. The latter was validated from the structure of trans-complex between phospho-Parkin with cis ACT (1-76 + 77-382 + pUb) (Fig. 8C), showing the ordered ACT region. The structural finding was further validated by biochemical assays (Fig. 8 D-F, Extended Data Fig. 9C-E).

      The structure of TEV-treated R0RBR (TEV) (Extended Data Fig. 4C) was done to ensure that the inclusion of TEV and treatment with TEV protease did not perturb Parkin folding, an important control for our biophysical experiments.

      (2) There are no experiments that definitively show the in trans activation of Parkin. The binding experiments and size exclusion chromatography are a good start, but the way these experiments are performed, they'd be better suited as support for a stronger experiment showing Parkin dimerization. In addition, the rationale for an in trans activation model is not convincingly explained until the concept of Parkin isoforms is introduced in the Discussion. The authors should consider expanding this concept into other parts of the manuscript.

      We thank the reviewer for appreciating the Parkin dimerization. Our biophysical data in Fig. 5C shows that Parkin dimerization is mediated by interactions between phospho-Ubl and RING0 in trans, leading to the displacement of RING2. However, Parkin K211N (on RING0) mutation perturbs interaction with phospho-Parkin and leads to loss of Parkin dimerization and loss of RING2 displacement (Fig. 5C). The interaction between pUbl and K211 pocket on RING0 leads to the displacement of RING2 resulting in Parkin activation as catalytic residue C431 on RING2 is exposed for catalysis. The biophysical experiment is further confirmed by a biochemical experiment where the addition of catalytically in-active phospho-Parkin T270R/C431A activates autoinhibited WT-Parkin in trans using the mechanism as discussed (a schematic representation also shown in Author response image 2).

      We thank this reviewer regarding Parkin isoforms. In the revised manuscript, we have included Parkin isoforms in the results section, too.

      (2a) For the in trans activation experiment using wt Parkin and pParkin (T270R/C431A) (Figure 3D), there needs to be a large excess of pParkin to stimulate the catalytic activity of wt Parkin. This experiment has low cellular relevance as these point mutations are unlikely to occur together to create this nonfunctional pParkin protein. In the case of pParkin activating wt Parkin (regardless of artificial point mutations inserted to study specifically the in trans activation), if there needs to be much more pParkin around to fully activate wt Parkin, isn't it just more likely that the pParkin would activate in cis?

      To test phospho-Parkin as an activator of Parkin in trans, we wanted to use the catalytically inactive version of phospho-Parkin to avoid the background activity of p-Parkin. While it is true that a large excess of pParkin (T270R/C431A) is required to activate WT-Parkin in the in vitro set-up, it is not very surprising as in WT-Parkin, the unphosphorylated Ubl domain would block the E2 binding site on RING1. Also, due to interactions between pParkin (T270R/C431A) molecules, the net concentration of pParkin (T270R/C431A) as an activator would be much lower. However, the Ubl blocking E2 binding site on RING1 won’t be an issue between phospho-Parkin molecules or between Parkin isoforms (lacking Ubl domain or RING2).

      (2ai) Another underlying issue with this experiment is that the authors do not consider the possibility that the increased activity observed is a result of increased "substrate" for auto-ubiquitination, as opposed to any role in catalytic activation. Have the authors considered looking at Miro as a substrate in order to control for this?

      This is quite an interesting point. However, this will be only possible if Parkin is ubiquitinated in trans, as auto-ubiquitination is possible with active Parkin and not with catalytically dead (phospho-Parkin T270R, C431A) or autoinhibited (WT-Parkin). Also, in the previous version of the manuscript, where we used only phospho-Ubl as an activator of Parkin in trans, we tested Miro1 ubiquitination and auto-ubiquitination, and the results were the same (Author response image 4).

      Author response image 4.

      (2b) The authors mention a "higher net concentration" of the "fused domains" with RING0, and use this to justify artificially cleaving the Ubl or RING2 domains from the Parkin core. This fact should be moot. In cells, it is expected there will only be a 1:1 ratio of the Parkin core with the Ubl or RING2 domains. To date, there is no evidence suggesting multiple pUbls or multiple RING2s can bind the RING0 binding site. In fact, the authors here even show that either the RING2 or pUbl needs to be displaced to permit the binding of the other domain. That being said, there would be no "higher net concentration" because there would always be the same molar equivalents of Ubl, RING2, and the Parkin core.

      We apologize for the confusion. “Higher net concentration” is with respect to fused domains versus the domain provided in trans. Due to the competing nature of the interactions between pUbl/RING2 and RING0, the interactions are too transient and beyond the detection limit of the biophysical techniques. While the domains are fused (for example, RING0-RING2 in the same polypeptide) in a polypeptide, their effective concentrations are much higher than those (for example, pUbl) provided in trans; thus, biophysical methods fail to detect the interaction. Treatment with protease solves the above issue due to the higher net concentration of the fused domain, and trans interactions can be measured using biophysical techniques. However, the nature of these interactions and conformational changes is very transient, which is also suggested by the data. Therefore, Parkin molecules will never remain associated; rather, Parkin will transiently interact and activate Parkin molecules in trans.

      (2c) A larger issue remaining in terms of Parkin activation is the lack of clarity surrounding the role of the linker (77-140); particularly whether its primary role is to tether the Ubl to the cis Parkin molecule versus a role in permitting distal interactions to a trans molecule. The way the authors have conducted the experiments presented in Figure 2 limits the possible interactions that the activated pUbl could have by (a) ablating the binding site in the cis molecule with the K211N mutation; (b) further blocking the binding site in the cis molecule by keeping the RING2 domain intact. These restrictions to the cis parkin molecule effectively force the pUbl to bind in trans. A competition experiment to demonstrate the likelihood of cis or trans activation in direct comparison with each other would provide stronger evidence for trans activation.

      This is an excellent point. In the revised manuscript, we have performed experiments using native phospho-Parkin (Revised Figure 5), and the results are consistent with those in Figure 2 ( Revised Figure 4), where we used the K211N mutation.

      (3) A major limitation of this study is that the authors interpret structural flexibility from experiments that do not report directly on flexibility. The analytical SEC experiments report on binding affinity and more specifically off-rates. By removing the interdomain linkages, the accompanying on-rate would be drastically impacted, and thus the observations are disconnected from a native scenario. Likewise, observations from protein crystallography can be consistent with flexibility, but certainly should not be directly interpreted in this manner. Rigorous determination of linker and/or domain flexibility would require alternative methods that measure this directly.

      We also agree with the reviewer that these methods do not directly capture structural flexibility. Also, rigorous determination of linker flexibility would require alternative methods that measure this directly. However, due to the complex nature of interactions and technical limitations, breaking the interdomain linkages was the best possible way to capture interactions in trans. Interestingly, all previous methods that report cis interactions between pUbl and RING0 also used a similar approach (Gladkova et al.ref. 24, Sauve et al. ref. 29).  

      (4) The analysis of the ACT element comes across as incomplete. The authors make a point of a competing interaction with Lys48 of the Ubl domain, but the significance of this is unclear. It is possible that this observation could be an overinterpretation of the crystal structures. Additionally, the rationale for why the ACT element should or shouldn't contribute to in trans activation of different Parkin constructs is not clear. Lastly, the conclusion that this work explains the evolutionary nature of this element in chordates is highly overstated.

      We agree with the reviewer that the significance of Lys48 is unclear. We have presented this just as one of the observations from the crystal structure. As the reviewer suggested, we have removed the sentence about the evolutionary nature of this element from the revised manuscript.

      (5) The analysis of the REP linker element also seems incomplete. The authors identify contacts to a neighboring pUb molecule in their crystal structure, but the connection between this interface (which could be a crystallization artifact) and their biochemical activity data is not straightforward. The analysis of flexibility within this region using crystallographic and AlphaFold modeling observations is very indirect. The authors also draw parallels with linker regions in other RBR ligases that are involved in recognizing the E2-loaded Ub. Firstly, it is not clear from the text or figures whether the "conserved" hydrophobic within the linker region is involved in these alternative Ub interfaces. And secondly, the authors appear to jump to the conclusion that the Parkin linker region also binds an E2-loaded Ub, even though their original observation from the crystal structure seems inconsistent with this. The entire analysis feels very preliminary and also comes across as tangential to the primary storyline of in trans Parkin activation.

      We agree with the reviewer that crystal structure data and biochemical data are not directly linked. In the revised manuscript, we have also highlighted the conserved hydrophobic in the linker region at the ubiquitin interface (Fig. 9C and Extended Data Fig. 11A), which was somehow missed in the original manuscript. We want to add that a very similar analysis and supporting experiments identified donor ubiquitin-binding sites on the IBR and helix connecting RING1-IBR (Kumar et al., Nature Str. and Mol. Biol., 2017), which several other groups later confirmed. In the mentioned study, the Ubl domain of Parkin from the symmetry mate Parkin molecule was identified as a mimic of “donor ubiquitin” on IBR and helix connecting RING1-IBR.

      In the present study, a neighboring pUb molecule in the crystal structure is identified as a donor ubiquitin mimic (Fig. 9C) by supporting biophysical/biochemical experiments. First, we show that mutation of I411A in the REP linker of Parkin perturbs Parkin interaction with E2~Ub (donor) (Fig. 9F). Another supporting experiment was performed using a Ubiquitin-VS probe assay, which is independent of E2. Assays using Ubiquitin-VS show that I411A mutation in the REP-RING2 linker perturbs Parkin charging with Ubiquitin-VS (Extended Data Fig. 11 B). Furthermore, the biophysical data showing loss of Parkin interaction with donor ubiquitin is further supported by ubiquitination assays. Mutations in the REP-RING2 linker perturb the Parkin activity (Fig. 9E), confirming biophysical data. This is further confirmed by mutations (L71A or L73A) on ubiquitin (Extended Data Fig. 11C), resulting in loss of Parkin activity. The above experiments nicely establish the role of the REP-RING2 linker in interaction with donor ubiquitin, which is consistent with other RBRs (Extended Data Fig. 11A).

      While we agree with the reviewer that this appears tangential to the primary storyline in trans-Parkin activation, we decided to include this data because it could be of interest to the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) For clarity, a schematic of the domain architecture of Parkin would be helpful at the outset in the main figures. This will help with the introduction to better understand the protein organization. This is lost in the Extended Figure in my opinion.

      We thank the reviewer for suggesting this, which we have included in Figure 1 of the revised manuscript.

      (2) Related to the competition between the Ubl and RING2 domains, can competition be shown through another method? SPR, ITC, etc? ITC was used in other experiments, but only in the context of mutations (Lys211Asn)? Can this be done with WT sequence?

      This is an excellent suggestion. In the revised Figure 5, we have performed ITC experiment using WT Parkin, and the results are consistent with what we observed using Lys211Asn Parkin.

      (3) The authors also note that "the AlphaFold model shows a helical structure in the linker region of Parkin (Extended Data Figure 10C), further confirming the flexible nature of this region"... but the secondary structure would not be inherently flexible. This is confusing.

      The flexibility is in terms of the conformation of this linker region observed under the open or closed state of Parkin. In the revised manuscript, we have explained this point more clearly.

      (4) The manuscript needs extensive revision to improve its readability. Minor grammatical mistakes were prevalent throughout.

      We thank the reviewer for pointing out this and we have corrected these in the revised manuscript.

      (5) The confocal images are nice, but inset panels may help highlight the regions of interest (ROIs).

      This is corrected in the revised manuscript.

      (6) Trans is misspelled ("tans") towards the end of the second paragraph on page 16.

      This is corrected in the revised manuscript.

      (7) The schematics are helpful, but some of the lettering in Figure 2 is very small.

      This is corrected in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) A significant portion of the results section refers to the supplement, making the overall readability very difficult.

      We accept this issue as a lot of relevant data could not be added to the main figures and thus ended up in the supplement.  In the revised manuscript, we have moved some of the supplementary figures to the main figures.

      (2) Interpretation of the experiments utilizing many different Parkin constructs and cleavage scenarios (particularly the SEC and crystallography experiments) is extremely difficult. The work would benefit from a layout of the Parkin model system, highlighting cleavage sites, key domain terminology, and mutations used in the study, presented together and early on in the manuscript. Using this to identify a simpler system of referencing Parkin constructs would also be a large improvement.

      This is a great suggestion. We have included these points in the revised manuscript, which has improved the readability.

      (3) Lines 81-83; the authors say they "demonstrate the conformational changes in Parkin during the activation process", but fail to show any actual conformational changes. Further, much of what is demonstrated in this work (in terms of crystal structures) corroborates existing literature. The authors should use caution not to overstate their original conclusions in light of the large body of work in this area.

      We thank the reviewer for pointing out this. We have corrected the above statement in the revised manuscript to indicate that we meant it in the context of trans conformational changes.

      (4) Line 446 and 434; there is a discrepancy about which amino acid is present at residue 409. Is this a K408 typo? The authors also present mutational work on K416, but this residue is not shown in the structure panel.

      We thank the reviewer for pointing out this. In the revised manuscript, we have corrected these typos.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Revisions Round 1

      Reviewer #1

      We thank the reviewer for their careful reading of our manuscript and have taken all of their grammatical corrections into account.

      Reviewer #2 (Public Review): 

      Weaknesses: 

      The paper contains multiple instances of non-scientific language, as indicated below. It would also benefit from additional details on the cryo-EM structure determination in the Methods and inclusion of commonly accepted requirements for cryo-EM structures, like examples of 2D class averages, raw micrographs, and FSC curves (between half-maps as well as between rigid-body fitted (or refined) atomic models of the different polymorphs and their corresponding maps). In addition, cryo-EM maps for the control experiments F1 and F2 should be presented in Figure 9.

      We tried to correct the non-scientific language and have included the suggested data on the Cryo-EM analyses including new Figures 11-17.  We did not collect data on the sample used for the seeds in the cross seeding experiments because we had already confirmed in multiple datasets that the conditions in F1 and F2 reproducibly produce fibrils of Type 1 and Type 3, respectively. We have now analyzed cryo-EM data for 6 more samples at pH 7.0 and found that several kinds of polymorphs (Types 1A, 1M, 2A, 2B and 5) are accessible at this pH, however the Type 3 polymorphs are not formed at pH 7.0 under the conditions that we used for aggregation.

      Reviewer #2 (Recommendations For The Authors): 

      - remove unscientific language: "it seems that there are about as many unique atomic-resolution structures of these aggregates as there are publications describing them"   

      We have rephrased this sentence.

      - for same reason, remove "Obviously, " 

      Done

      - What does this mean? “polymorph-unspecific” 

      Rephrased as non-polymorph-specific

      - What does this mean? "shallow amyloid energy hypersurface"  

      By “shallow hypersurface” we mean that the minimum of the multi-dimensional function that describes the energy of the amyloid is not so deep that subtle changes to the environment will not favor another fold/energy minimum. We have left the sentence because while it may not be perfect, it is concise and seems to get the point across.

      - "The results also confirm the possibility of producing disease-relevant structure in vitro." -> This is incorrect as no disease-relevant structure was replicated in this work. Use another word like “suggest”.

      We have changed to “suggest” as suggested.

      - Remove "historically" 

      Done

      - Rephrase “It has long been understood that all amyloids contain a common structural scaffold” 

      Changed to “It has long been established that all amyloids contain a common structural scaffold..” 

      - "Amyloid polymorphs whose differences lie in both their tertiary structure (the arrangement of the beta-strands) and the quaternary structure (protofilamentprotofilament assembly) have been found to display distinct biological activities [8]" -> I don't think this is true, different biological activities of amyloids have never been linked to their distinct structures.  

      We have added 5 new references (8-12) to support this sentence.

      - Reference 10 is a comment on reference 9; it should be removed. Instead, as for alpha-synuclein, all papers describing the tau structures should be included.  

      We have removed the reference, but feel that the addition of all Tau structure references is not merited in this manuscript since we are not comparing them.

      - Rephrase: "is not always 100% faithful"

      Removed “100%”

      - What is pseudo-C2 symmetry? Do the authors mean pseudo 2_1 symmetry (ie a 2-start helical symmetry)?

      Thank for pointing this out.  We did indeed mean pseudo 21 helical symmetry.  

      - Re-phrase: "alpha-Syn's chameleon-like behavior" 

      We have removed this phrase.

      - "In the case of alpha-Syn, the secondary nucleation mechanism is based on the interaction of the positively charged N-terminal region of monomeric alpha-Syn and the disordered, negatively charged C-terminal region of the alpha-Syn amyloid fibrils [54]" -> I would say the mechanisms of secondary nucleation are not that well understood yet, so one may want to tune this down a bit. 

      We have changed this to “mechanism has been proposed to be”

      - The paragraphs describing experiments by others are better suited for a Discussion rather than a Results section. Perhaps re-organize this part? 

      We have left the text intact as we are using a Results and Discussion format.

      - A lot of information about Image processing seems to be missing: what steps were performed after initial model generation? 

      We have added more details in the methods section on the EM data processing and model analysis.

      - Figure 1: Where is Type 4 on the pH scale?

      We have adjusted the Fig 1 legend to clarify that pH scale is only applicable to the structures presented in this manuscript. 

      - Figure 2: This might be better incorporated as a subpanel of Figure 1.

      We agree that this figure is somewhat of a loner on its own and we only added it in order to avoid confusion with the somewhat inconsistent naming scheme used for the Type 1B structure. However, we prefer to leave it as a separate figure so that it does not get dilute the impact of figure 1.

      - Figure 3: What is the extra density at the bottom of Type 3B from pH 5.8 samples 1 and 2. pH 5.8 + 50mM NaCl (but not pH 5.8 + 100 mM NaCl)? Could this be an indication of a local minimum and the pH 5.8 + 100 mM NaCl structure is correct? Or is this a real difference between 0/50mM NaCl and 100 mM NaCl? 

      We did not see the extra density to which the reviewer is referring, however the images used in this panel are the based on the output of 3D-classification which is more likely to produce more artifacts than a 3D refinement. With this in mind, we did not see any significant differences in the refined structures and therefore only deposited the better quality map and model for each of the polymorph types.

      - Figure 3: To what extent is Type 3B of pH 6.5 still a mixture of different types? The density looks poor. In general, in the absence of more details about the cryo-EM maps, it is hard to assess the quality of the structures presented.  

      In order to improve the quality of the images in this panel, a more complete separation of the particles from each polymorph was achieved via the filament subset selection tool in RELION 5. In each case, an unbiased could be created from the 2D classes via the relion_helix_inimodel2D program, further supporting the coexistence of 4 polymorphs in the pH 6.5 sample. The particles were individually refined to produce the respective maps that are now used in this figure.

      - Many references are incorrect, containing "Preprint at (20xx)" statements.

      This has been corrected.  

      Reviewer #3 (Public Review): 

      Weaknesses: 

      (1) The authors reveal that both Type 1 monofilament fibril polymorph (reminiscent of JOSlike polymorph) and Type 5 polymorph (akin to tissue-amplified-like polymorph) can both form under the same condition. Additionally, this condition also fosters the formation of flat ribbon-like fibril across different batches. Notably, at pH 5.8, variations in experimental groups yield disparate abundance ratios between polymorph 3B and 3C, indicating a degree of instability in fibrillar formation. The variability would potentially pose challenges for replicability in subsequent research. In light of these situations, I propose the following recommendations: 

      (a) An explicit elucidation of the factors contributing to these divergent outcomes under similar experimental conditions is warranted. This should include an exploration of whether variations in purified protein batches are contributing factors to the observed heterogeneity.

      We are in complete agreement that understanding the factors that lead to polymorph variability is of utmost importance (and was the impetus for the manuscript itself). However the number of variables to explore is overwhelming and we will continue to investigate this in our future research. Regarding the variability between batches of purified protein, we also think that this could be a factor in the polymorph variability observed for otherwise “identical” aggregation conditions, particularly at pH 7 where the largest variety of polymorphs have been observed. However, even variation between identical replicates (samples created from the same protein solution and simply aggregated simultaneously in separate tubes) can lead to different outcomes (see datasets 15 and 16 in the revised Table 1) suggesting that there are stochastic processes that can determine the outcome of an individual aggregation experiment. While our data still indicates that Type 1,2 and 3 polymorphs are strongly selected by pH, the selection between interface variants 3B vs. 3C and 2A vs. 2B might also be affected by protein purity. Our standard purification protocol produces a single band by coomassie-stained SDS-PAGE however minor truncations and other impurities below a few percent would go undetected and, given the proposed roles of the N and C-termini in secondary nucleation, could have a large effect on polymorph selection and seeding. In line with the reviewer’s comments we now include a batch number for each EM dataset. While no new conclusions can be drawn from the inclusion of this additional data, we feel that it is important to acknowledge the possible role of batch to batch variability. 

      (b) To enhance the robustness of the conclusions, additional replicates of the experiments under the same condition should be conducted, ideally a minimum of three times.  

      The pH 5.8 conditions that yield Type 3 fibrils has already been repeated several times in the original manuscript. Since the pH 7.4 conditions produce the most common a-Syn polymorph (Type 1A) and were produced twice in this manuscript (once as an unseeded and once as a cross-seeded fibrilization) we decided to focus on the intermediate condition where the most variability had been seen (pH 7.0). The revised table 1 now has 6 new datasets (11-16) representing 6 independent aggregations at pH 7.0 starting from two different protein purification batches. The results is that we now produce the type 2A/B polymorphs in three samples and in two of these samples we once again observed the type 1M polymorph.  The other samples produced Type 1A or non-twisted fibrils.

      (c) Further investigation into whether different polymorphs formed under the same buffer condition could lead to distinct toxicological and pathology effects would be a valuable addition to the study.  

      The correlation of toxicity with structure would in principle be interesting. However the Type 1 and Type 3 polymorphs formed at pH 5.8 and 7.4 are not likely to be biologically relevant. The pH 7 polymorphs (Type 5 and 1M) would be more interesting because they form under the same conditions and might be related to some disease relevant structures. Still, it is rare that a single polymorph appears at 7.0 (the Type 5 represented only 10-20% of the fibrils in the sample and the Type 1M also had unidentified double-filament fibrils in the sample). We plan to pursue this line of research and hope to include it in a future publication.

      (2) The cross-seeding study presented in the manuscript demonstrates the pivotal role of pH conditions in dictating conformation. However, an intriguing aspect that emerges is the potential role of seed concentration in determining the resultant product structure. This raises a critical question: at what specific seed concentration does the determining factor for polymorph selection shift from pH condition to seed concentration? A methodological robust approach to address this should be conducted through a series of experiments across a range of seed concentrations. Such an approach could delineate a clear boundary at which seed concentration begins to predominantly dictate the conformation, as opposed to pH conditions. Incorporating this aspect into the study would not only clarify the interplay between seed concentration and pH conditions, but also add a fascinating dimension to the understanding of polymorph selection mechanisms.

      A more complete analysis of the mechanisms of aggregation, including the effect of seed concentration and the resulting polymorph specificity of the process, are all very important for our understanding of the aggregation pathways of alpha-synuclein and are currently the topic of ongoing investigations in our lab.

      Furthermore, the study prompts additional queries regarding the behavior of cross-seeding production under the same pH conditions when employing seeds of distinct conformation. Evidence from various studies, such as those involving E46K and G51D cross-seeding, suggests that seed structure plays a crucial role in dictating polymorph selection. A key question is whether these products consistently mirror the structure of their respective seeds. 

      We thank the reviewer for reminding us to cite these studies as a clear example of polymorph selection by cross-seeding. Unfortunately, it is not 100% clear from the G51D cross seeding manuscript (https://doi.org/10.1038/s41467-021-26433-2) what conditions were used in the cross-seeding since different conditions were used for the seedless wild-type and mutant aggregations… however it appears that the wildtype without seeds was Tris pH 7.5 (although at 37C the pH could have dropped to 7ish) and the cross-seeded wild-type was in Phosphate buffer at pH 7.0. In the E46K cross-seeding manuscript, it appears that pH 7.5 Tris was used for all fibrilizations (https://doi.org/10.1073/pnas.2012435118).  In any event, both results point to the fact that at pH 7.0-7.5 under low-seed conditions (0.5%) the Type 4 polymorph can propagate in a seed specific manner.

      (3) In the Results section of "The buffer environment can dictate polymorph during seeded nucleation", the authors reference previous cell biological and biochemical assays to support the polymorph-specific seeding of MSA and PD patients under the same buffer conditions. This discussion is juxtaposed with recent research that compares the in vivo biological activities of hPFF, ampLB as well as LB, particularly in terms of seeding activity and pathology. Notably, this research suggests that ampLB, rather than hPFF, can accurately model the key aspects of Lewy Body Diseases (LBD) (refer to: https://doi.org/10.1038/s41467-023-42705-5). The critical issue here is the need to reconcile the phenomena observed in vitro with those in in-vivo or in-cell models. Given the low seed concentration reported in these studies, it is imperative for the authors to provide a more detailed explanation as to why the possible similar conformation could lead to divergent pathologies, including differences in cell-type preference and seeding capability.  

      We thank the reviewer for bring this recent report to our attention. The findings that ampLB and hPFF have different PK digestion patterns and that only the former is able to model key aspects of Lewy Body disease are in support of the seed-specific nature of some types of alpha-synuclein aggregation.  We have added this to the discussion regarding the significant role that seed type and seed conditions likely play in polymorph selection.

      (4) In the Method section of "Image processing", the authors describe the helical reconstruction procedure, without mentioning much detail about the 3D reconstruction and refinement process. For the benefit of reproducibility and to facilitate a deeper understanding among readers, the authors should enrich this part to include more comprehensive information, akin to the level of detail found in similar studies (refer to: https://doi.org/10.1038/nature23002).

      As also suggested by reviewer #2, we have now added more comprehensive information on the 3D reconstruction and refinement process.

      (5) The abbreviation of amino acids should be unified. In the Results section "On the structural heterogeneity of Type 1 polymorphs", the amino acids are denoted using three-letter abbreviation. Conversely, in the same section under "On the structural heterogeneity of Type 2 and 3 structures", amino acids are abbreviated using the one-letter format. For clarity and consistency, it is essential that a standardized format for amino acid abbreviations be adopted throughout the manuscript.

      That makes perfect sense and had been corrected.

      Reviewing Editor: 

      After discussion among the reviewers, it was decided that point 2 in Reviewer #3's Public Review (about the experiments with different concentrations of seeds) would probably lie outside the scope of a reasonable revision for this work. 

      We agree as stated above and will continue to work on this important point.

      Revisions Round 2

      Reviewer #2 (Public Review): 

      I do worry that the FSC values of model-vs-map appear to be higher than expected from the corresponding FSCs between the half-maps (e.g. see Fig 13). The implication of this observation is that the atomic models may have been overfitted in the maps, which would have led to a deterioration of their geometry. A table with rmsd on bond lengths, angles, etc would probably show this. In addition, to check for overfitting, the atomic model for each data set could be refined in one of the half-maps, and then that same model could be used to calculate 2 FSC model-vs-map curves: one against the half-map it was refined in and one against the other half-map. Deviations between these two curves are an indication of overfitting. 

      Thank you for the recommendations for model validation.  We have added the suggested statistics to Table 2 and performed the suggested model fitting to one of the half-maps and plotted 3 FSC model-vs-map curves: one for each half-map versus the model fit against only one half map and one for the model fit against the full map. We feel that the degree of overfitting is reasonable and does not  significantly impact the quality of the models. 

      In addition, the sudden drop in the FSC curves in Figure 16 shows that something unexpected has happened to this refinement. Are the authors sure that only the procedures outlined in the Methods were used to create these curves? The unexpected nature of the FSC curve for this type (2A) raises doubts about the correctness of the reconstruction. 

      We thank the reviewer for the attention to detail.  We should have caught this mistake. It turns out that in the last round of 3D refinement, the two half-maps become shifted with respect to each other in the z direction. We realigned the two maps using Chimera and then re-ran the postprocessing. The new maps have been deposited in EMD-50850. This mistake motivated us to inspect all of the maps and we found the same problem had occurred in the Type 3B maps.  This was not noticed by the reviewer because we accidentally plotted the FSC curves from postprocessing from one refinement round before the one deposited in the EMD. We performed the same half-map shifting procedure for the Type 3B data and performed a final round of real-space refinement to produce new maps and models that have been deposited as EMD-50888 and 9FYP (superseding the previous entries).

      Reviewer #3 (Public Review): 

      There are two minor points I recommend the authors to address: 

      (1) In the response to Weakness 1, point (3), the authors state that "the Type 5 represented only 10-20% of the fibrils in the sample." However, this information is not labeled in the corresponding Figure 4. I suggest the authors verify and label all relevant percentages in the figures to prevent misunderstandings. 

      We aim to be as transparent as possible and this information was included in the main text however we did not label the percentage of Type 5 fibrils in Figure 4 because that would make the other percentages ambiguous.  The percentages in Figure 4 represent the ratio of helical segments used for each type of refined structure in the dataset (always adding up to 100%), not the percent of all fibrils in the dataset.  That is, there are sometimes untwisted or unidentifiable fibrils in datasets and these were not accounted for in the listed percentages. We have added a sentence to the Figure 4 legend to explain to what the percentages refer.

      (2) While the authors have detailed the helical reconstruction procedure in the Methods section, it is necessary to indicate the scale bar or box size in the figure legend of the 2D representative classes to ensure clarity and reproducibility. 

      Thank you for reminding us to add the scale bars. This is now done for the 2D classes in Figures 11-17.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): 

      A critical look at the maps and models of the various structures at this stage may prevent the authors from entering suboptimal structures into the databases.  

      We agree. Thank you for suggesting this.

      Reviewer #3 (Recommendations For The Authors): 

      The authors have responded adequately to these critiques in the revised version of the manuscript. There are two minor points. 

      (1) The authors state that "the Type 5 represented only 10-20% of the fibrils in the sample." However, this information is not labeled in the corresponding Figure 4. I suggest the authors verify and label all relevant percentages in the figures to prevent misunderstandings. 

      (2) While the authors have detailed the helical reconstruction procedure in the Methods section, it is necessary to indicate the scale bar or box size in the figure legend of the 2D representative classes to ensure clarity and reproducibility. 

      Answered in public comments

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Little is known about the local circuit mechanisms in the preoptic area (POA) that regulate body temperature. This carefully executed study investigates the role of GABAergic interneurons in the POA that express neurotensin (NTS). The principal finding is that GABA-release from these cells inhibits neighboring neurons, including warm-activated PACAP neurons, thereby promoting hyperthermia, whereas NTS released from these cells has the opposite effect, causing a delayed activation and hypothermia. This is shown through an elegant series of experiments that include slice recordings alongside matched in vivo functional manipulations. The roles of the two neurotransmitters are distinguished using a cell-type-specific knockout of Vgat as well as pharmacology to block GABA and NTS receptors. Overall, this is an excellent study that is noteworthy for revealing local circuit mechanisms in the POA that control body temperature and also for highlighting how amino acid neurotransmitters and neuropeptides released from the same cell can have opposing physiologic effects. I have only minor suggestions for revision.

      Reviewer #2 (Public Review):

      Summary:

      The study has demonstrated how two neurotransmitters and neuromodulators from the same neurons can be regulated and utilized in thermoregulation.

      The study utilized electrophysiological methods to examine the characteristics and thermoregulation of Neurotensin (Nts)-expressing neurons in the medial preoptic area (MPO). It was discovered that GABA and Nts may be co-released by neurons in MPO when communicating with their target neurons.

      Strengths:

      The study has leveraged optogenetic, chemogenetic, knockout, and pharmacological inhibitors to investigate the release process of Nts and GABA in controlling body temperature.

      The findings are relevant to those interested in the various functions of specific neuron populations and their distinct regulatory mechanisms on neurotransmitter/neuromodulator activities

      Weaknesses:

      Key points for consideration include:

      (1) The co-release of GABA and Nts is primarily inferred rather than directly proven. Providing more direct evidence for the release of GABA and the co-release of GABA and Nts would strengthen the argument. Further in vitro analysis could strengthen the conclusion regarding this co-releasing process.

      Measurement of Nts concentrations in various brain regions during thermoregulatory responses is part of a future study.

      (2) The differences between optogenetic and chemogenetic methods were not thoroughly investigated. A comparison of in vitro results and direct observation of release patterns could clarify the mechanisms of GABA release alone or in conjunction with Nts under different stimulation techniques.

      A comparison of chemogenetic and optogenetic stimulation methods is not within the scope of this study.

      (3) Neuronal transcripts were mainly identified through PCR, and alternative methods like single-cell sequencing could be explored.

      Single cell transcriptomics of preoptic neurotensinergic neurons will be part of a different study.

      (4) In Figure 6, the impact of GABA released from Nts neurons in MPO on CBT regulation appears to vary with ambient temperatures, requiring a more detailed explanation for better comprehension.

      The different possible roles of GABA in different thermoregulatory circumstances is discussed on lines 555-581.

      (5) The model should emphasize the key findings of the study.

      The model is presented in Fig 8.

      Reviewer #3 (Public Review):

      Summary:

      Understanding the central neural circuits regulating body temperature is critical for improving health outcomes in many disease conditions and in combating heat stress in an ever-warming environment. The authors present important and detailed new data that characterizes a specific population of POA neurons with a relationship to thermoregulation. The new insights provided in this manuscript are exactly what is needed to assemble a neural network model of the central thermoregulatory circuitry that will contribute significantly to our understanding of regulating the critical homeostatic variable of body temperature. These experiments were conducted with the expertise of an investigator with career-long experience in intracellular recordings from POA neurons. They were interpreted conservatively in the appropriate context of current literature.

      The Introduction begins with "Homeotherms, including mammals, maintain core body temperature (CBT) within a narrow range", but this ignores the frequent hypothermic episodes of torpor that mice undergo triggered by cold exposure. Although the author does mention torpor briefly in the Discussion, since these experiments were carried out exclusively in mice, greater consideration (albeit speculative) of the potential for a role of MPO Nts neurons in torpor initiation or recovery is warranted. This is especially the case since some 'torpor neurons' have been characterized as PACAP-expressing and a population of PACAP neurons represent the target of MPO Nts neurons.

      Additional discussion of a possible role of neurotensinergic neurons in the initiation or recovery from torpor is included (lines 593-597).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary:

      The authors profile gene expression, chromatin accessibility, and chromosomal architecture (by Hi-C) in activated CD4 T cells and use this information to link non-coding variants associated with autoimmune diseases with putative target genes. They find over 1000 genes physically linked with autoimmune disease loci in these cells, many of which are upregulated upon T cell activation. Focusing on IL2, they dissect the regulatory architecture of this locus, including the allelic effects of GWAS variants. They also intersect their variant-to-gene lists with data from CRISPR screens for genes involved in CD4 T cell activation and expression of inflammatory genes, finding enrichments for regulators. Finally, they showed that pharmacological inhibition of some of these genes impacts T-cell activation. 

      This is a solid study that follows a well-established canvas for variant-to-gene prioritisation using 3D genomics, applying it to activated T cells. The authors go some way in validating the lists of candidate genes, as well as exploring the regulatory architecture of a candidate GWAS locus. Jointly with data from previous studies performing variant-to-gene assignment in activated CD4 T cells (and other immune cells), this work provides a useful additional resource for interpreting autoimmune disease-associated genetic variation. 

      Suggestions for improvement:

      Autoimmune disease variants were already linked with genes in CD28-stimulated CD4 T cells using chromosome conformation capture, specifically Promoter CHi-C and the COGS pipeline (Javierre et al., Cell 2016; Burren et al., Genome Biol 2017; Yang et al., Nat Comms 2020). The authors cite these papers and present a comparative analysis of their variant-to-gene assignments (in addition to scRNA-seq eQTL-based assignments). Furthermore, they find that the Burren analysis yields a higher enrichment for gold standard genes. 

      The obvious question that the authors don't venture into is why the results are quite different. In principle, this could be due to the differences between: 

      (a) the cell stimulation procedure 

      (b) the GWAS datasets used 

      (c)  the types of assay (Hi-C vs Capture Hi-C) 

      (d) approaches for defining gene-linked regions (loops vs neighbourhoods) 

      (e) how the GWAS signals at gene-linked regions are aggregated (e.g., the flavours of COGS in Javierre and Burren vs the authors' approach)

      Re (a), I'm not sure the authors make it explicitly clear in the main text that the Capture Hi-Cbased studies also use *stimulated* CD4 T cells, particularly in the section "Comparative predictive power...". So the cells used are pretty much the same, and the differences likely arise from points (b) to (e).

      It would be useful for the community to understand more clearly what is driving these differences, ideally with some added data. Could the authors, for example, take the PCHi-C data from Javierre/Burren and use their GWAS data and variant-to-gene assignment algorithms? 

      We greatly appreciate the referee’s expert assessment of our work and its value to the field, and we are glad that the referee was enthused by our comparison of the predictive power of the various V2G approaches. A point not emphasized enough in the original version of the manuscript is that we actually did harmonize the various datasets in the way the referee suggests for the precision/recall analysis. We took the contact maps presented from each paper, mapped genes using the same set of GWAS SNPs, and defined all gene-linked regions using our loop calling approach. This has been clarified in the revised version of the manuscript. We have now included a more thoughtful discussion of the possible sources of discrepancy between the different studies included in the comparison, and our thoughts on the potential sources raised by the referee are outlined below:

      (a) The modes of stimulation used are similar between studies, but timepoints and donors did vary, and ours was the only study that sorted naïve CD4+ T cells before stimulation. These aspects could represent a source of variability. 

      (b) The GWAS is not a source of variability because we re-ran the raw data from all the orthogonal studies through our V2G pipeline using the same GWAS as in the current manuscript. 

      (c) The use of HiC vs. Capture HiC is a likely source of variability. The Capture-HiC datasets included in our comparison are lower resolution (i.e. HindIII) but focus higher sequencing depth at promoters compared to our HiC datasets – i.e., Capture-HiC may mis-call loops to the wrong promoters due to lower resolution as we have shown in our previous study [Su, Human Genetics, 2021], and will miss distal SNP interactions at promoters not included in the capture set. While HiC is unbiased in this regard, HiC will fail to call some SNP-promoter loops called by CaptureHiC because the sequencing power is not specifically focused at promoters. 

      (d) For studies using neighborhood approaches, we re-ran the raw data through our loop calling algorithm to connect distal SNP to gene promoters, and regarding (e) above, we ran the raw data through our V2G pipeline to allow a better comparison.

      In addition, given that the authors use Hi-C, a popular method for V2G prioritisation for this type of data is currently ABC (Nasser et al, Nature 2021). Could the authors provide a comparative analysis with respect to the V2G assignments in the paper and, if they see it appropriate, also run ABC-based GWAS integration on their own Hi-C data?

      This is an excellent suggestion, which we have followed in the revised version of our manuscript. It should be noted (and we do so in the text of the revision) that there is an important caveat to bringing in the ABC model. Chromosome conformation-based approaches are biologically constrained (i.e., informed) by the natural structure of chromatin in the nucleus that controls how gene transcription is regulated in cis, and it does so in a way that brings value to GWAS data. However, the ABC model further constrains the input data by imposing non-biological filters that allow the algorithm to be applied, but impose artifactual limitations that may negatively impact interpretation and discovery. In addition to filtering out pseudogenes, bidirectional RNA, antisense RNAs, and small RNAs, the ABC model gene set eliminates genes ubiquitously expressed across tissues (based on the assumption that these genes are driven primarily by elements adjacent to their promoters) and only allows annotation of one promoter per gene, even though the median number of promoters per gene in the human genome is three. In contrast, our chromatin-based V2G removes pseudogenes, but includes lincRNA and small RNAs, and includes all alternative transcription start sites annotated by gencode. 

      To apply the ABC GWAS gene nomination model to our CD4+ T cell chromatin-based V2G data, we used our ATAC-seq data and publicly available CD4+ T cell H3K27ac ChIP-seq data as input, and integrated this with GWAS and the average ENCODE-derived HiC dataset from the original ABC paper. The activity-by-contact model nominated 650 genes, compared to 1836 genes when using our cell type-matched HiC data and analysis pipeline. Only 357 of these genes were nominated by both approaches; 1479 genes nominated by our approach were not nominated by ABC, while 293 genes not implicated by our approach were newly implicated by ABC. To determine how the ABC-constrained approach performs against the HIEI gold standard set, we subjected all datasets used for the comparison depicted in the new Figure 5D to the same promoter filter used by the ABC model prior as part of the precision-recall re-analysis. Firstly, we found that applying the restricted ABC model promoter annotation to all datasets did not have a large effect on recall, however, the precision of several of the datasets were affected. For example, using the restricted promoter set reduced the precision of our (Pahl) V2G approach and inflated the precision of the nearest gene to SNP metric. Second, the new precision-recall analysis shows that the ABC score-based approach is only half as sensitive at predicting HIEI genes as the chromatin-based V2G approaches. This indicates that constraining GWAS data with cell type- and state-specific 3D chromatin-based data brings more GWAS target gene predictive power than application of the multi-tissue-averaged HiC used by the ABC model. We thank the reviewer for helpful suggestions that have improved the quality of our study.

      Reviewer #2 (Public Review): 

      Summary:

      There is significant interest in characterizing the mechanisms by which genetic mutations linked to autoimmunity perturb immune processes. Pahl et al. collect information on dynamic accessible regions, genes, and 3D contacts in primary CD4+ T cell samples that have been stimulated ex vivo. The study includes a variety of analyses characterizing these dynamic changes. With TF footprinting they propose factors linked to active regulatory elements. They compare the performance of their variant mapping pipeline that uses their data versus existing datasets. Most compelling there was a deep dive into additional study of regulatory elements nearby the IL2 gene. Finally, they perform a pharmacological screen targeting several genes they suggest are involved in T cell proliferation. 

      Strengths:

      The work done characterizing elements at the IL2 locus is impressive. 

      Weaknesses:

      Missing critical context to evaluate claims. There are extensive studies performed on resting and activated immune cell states (CD4+ T cells and other cell types) and some at multiple time points or concentrations of stimuli that collect ATAC-seq and/or RNA-seq that have been ignored by this study. How do conclusions from previous studies compare to what the authors conclude here? It is impossible to evaluate the claims without this additional context. These are a few studies I am familiar with (the authors should perform a more comprehensive search to be sure they're not ignoring existing observations) that would be important to compare/contrast conclusions:  o Alasoo, K. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat. Genet. 50, 424-431 (2018). 

      - Calderon, D., Nguyen, M.L.T., Mezger, A. et al. Landscape of stimulation-responsive chromatin across diverse human immune cells. Nat Genet 51, 1494-1505 (2019). 

      - Gate, R.E., Cheng, C.S., Aiden, A.P. et al. Genetic determinants of co-accessible chromatin regions in activated T cells across humans. Nat Genet 50, 1140-1150 (2018).  o Glinos, D.A., Soskic, B., Williams, C. et al. Genomic profiling of T-cell activation suggests increased sensitivity of memory T cells to CD28 costimulation. Genes Immun 21, 390-408 (2020).  o Gutierrez-Arcelus, M., Baglaenko, Y., Arora, J. et al. Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci. Nat Genet 52, 247-253 (2020). 

      - Kim-Hellmuth, S. et al. Genetic regulatory effects modified by immune activation contribute to autoimmune disease associations. Nat. Commun. 8, 266 (2017).  o Ye, C. J. et al. Intersection of population variation and autoimmunity genetics in human T cell activation. Science 345, 1254665 (2014). 

      - As a general point, I appreciate it when each claim includes a corresponding effect size and p-value, which helps me evaluate the strength of significance of supporting evidence. 

      We greatly appreciate the referee’s expert assessment of our work and emphasis on the value of our functional follow-up studies. Our precision-recall analyses were not meant to represent an exhaustive comparison of all prior GWAS gene nomination studies, although we agree that this could (and should) be done as part of a separate study in a future manuscript. Instead, we focused on gene nomination studies that 1) analyzed resting and activated human CD4+ T cells, 2) whose experimental design was most comparable to our own studies, and 3) had raw data readily available in the appropriate formats to allow re-analysis and harmonization before comparison. This is a point we did not make sufficiently clear in the original version of the manuscript, but have clarified in the revision. 

      Based on this rationale, we agree that the studies by Gate et al. and Ye et al. should be included in our comparative precision-recall analysis, and we have done so in the revised manuscript. The Gate study reported ATAC-seq peak co-accessibility, caQTL, eQTL, and HiC data, and we now include the resulting gene nominations from these datasets in the precision-recall analysis. These datasets performed poorly with respect to nomination of HIEI genes, likely due to small sample numbers and low sequencing depth compared to the other eQTL and chromatin capture-based studies. The eQTL reported by Ye et al. nominated 15 genes for autoimmune traits, two of which were in the ‘truth’ HIEI set (IL7R and IL2RB). This resulted low predictive power but a high precision due to the low number of nominated genes compared to the other V2G datasets. As suggested by referee 1, we have also subjected our data to the ‘activity-by-contact’ (ABC) algorithm and have included this dataset in the comparison as well. Please see Figure 5 in the revised manuscript. 

      We have elected not to include data from the other studies suggested by the referee for the following reasons: The stimulation paradigm used in the Glinos study is very different from that used in other studies. Also, this study and the study by Calderon did not nominate genes. The studies by Alasoo et al. and Kim-Hellmuth et al. analyzed macrophages, which are not a comparable cell type to CD4+ T cells. The allele-specific eQTL study by Gutierrez-Arcelus et al. included relevant the cell type and activation states, but included a relatively small number of samples (24) and variants (561), and the raw data in dbGAP does not readily allow for re-analysis and harmonization with the other studies. We thank the reviewer for helpful suggestions that have improved the quality of our study.

      Reviewer #3 (Public Review): 

      Summary:

      This paper used RNAseq, ATACseq, and Hi-C to assess gene expression, chromatin accessibility, and chromatin physical associations for native CD4+ T cells as they respond to stimulation through TCR and CD28. With these data in hand, the authors identified 423 GWAS signals to their respective target genes, where most of these were not in the proximal promoter, but rather distal enhancers. The IL-2 gene was used as an example to identify new distal cisregulatory regions required for optimal IL-2 gene transcription. These distal elements interact with the proximal IL2 promoter region. When the distal enhancer contained an autoimmune SNP, it affected IL-2 gene transcription. The authors also identified genetic risk variants that were associated with genes upon activation. Some of these regulate proliferation and cytokine production, but others are novel. 

      Strengths:

      This paper provides a wealth of data related to gene expression after CD4 T cells are activated through the TCR and CD28. An important strength of this paper is that these data were intensively analyzed to uncover autoimmune disease SNPs in cis-acting regions. Many of these could be assigned to likely target genes even though they often are in distal enhancers. These findings help to provide a better understanding concerning the mechanism by which GWAS risk elements impact gene expression. 

      Another strength of this study was the proof-of-principle studies examining the IL-2 gene. Not only were new cis-acting enhancers discovered, but they were functionally shown to be important in regulating IL-2 expression, including susceptibility to colitis. Their importance was also established with respect to such distal enhancers harboring disease-relevant SNPs, which were shown to affect IL-2 transcription. 

      The data from this study were also mined against past CRISPR screens that identified genes that control aspects of CD4 T cell activation. From these comparisons, novel genes were identified that function during T cell activation. 

      Weaknesses:

      A weakness of this study is that few individuals were analyzed, i.e., RNAseq and ATACseq (n=3) and HiC (n=2). Thus, the authors may have underestimated potentially relevant risk associations by their chromatin capture-based methodology. This might account for the low overlap of their data with the eQTL-based approach or the HIEI truth set. 

      Impact:

      This study indicates that defining distal chromatin interacting regions helps to identify distal genetic elements, including relevant variants, that contribute to gene activation. 

      We greatly appreciate the referee’s expert assessment of our work and emphasis on the value of our functional follow-up studies. We have ensured that all sample sizes, effect sizes, p values and FDR statistics are included in the figures and figure legends. We agree that including more donors for the HiC studies would increase the number of implicated variants and genes, however, all the chromatin-based V2G approaches described in our manuscript use relatively small sample sizes, but implicate more variants and genes than the comparable eQTL studies. I.e., the low overlap is not driven by a paucity of GWAS-chromatin-based associations. An alternative explanation for the low overlap between GWAS-chromatin-based approaches and eQTL approaches was recently by Pritchard and colleagues, who reported that GWAS and eQTL studies systematically implicate different types of variants (Mostafavi et al., Nature Genetics 2023). Among other differences, eQTL tend to implicate nearby genes while GWAS variants implicate distant genes, and our results support this contention. We referred to this study in the original version of the manuscript, but have included a more extensive discussion of potential explanations in the revised version. We thank the reviewer for helpful suggestions that have improved the quality of our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This study investigates associations between retrotransposon element expression and methylation with age and inflammation, using multiple public datasets. The study is valuable because a systematic analysis of retrotransposon element expression during human aging has been lacking. However, the data provided are incomplete due to the sole reliance on microarray expression data for the core analysis of the paper. 

      Both reviewers found this study to be important. We have selected the microarray datasets of human blood adopted by a comprehensive study of ageing published in a Nature

      Communications manuscript (DOI: doi: 10.1038/ncomms9570). We only included the datasets specifically collected for ageing studies. Therefore, the large RNA-seq cohorts for cancer, cardiovascular, and neurological diseases were not relevant to this study and cannot be included.   

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Tsai and Seymen et al. investigate associations between RTE expression and methylation and age and inflammation, using multiple public datasets. The concept of the study is in principle interesting, as a systematic analysis of RTE expression during human aging is lacking. 

      We thank the reviewer for the positive comment. 

      Unfortunately, the reliance on expression microarray data, used to perform the core analysis of the paper places much of the study on shaky ground. The findings of the study would not be sufficiently supported until the authors validate them with more suitable methods. 

      In our discussion section in the manuscript, we have clarified that “we are aware of the limitations imposed by using microarray in this study, particularly the low number of intergenic probes in the expression microarray data. Our study can be enriched with the advent of large  RNA-seq cohorts for aging studies in the future.”  However, the application of microarray for RTE expression analysis was introduced previously (DOI: 10.1371/journal.pcbi.1002486) and applied in some highly cited and important publications before (DOI: 10.1038/ncomms1180, DOI: 10.1093/jnci/djr540). In fact, in a manuscript published by Reichmann et al.  (DOI: 10.1371/journal.pcbi.1002486) which was cited 76 times, the authors showed and experimentally verified that cryptic repetitive element probes present in Illumina and Affymetrix gene expression microarray platforms can accurately and sensitively monitor repetitive element expression data. Inspired by this methodological manuscript with reasonable acceptance by other researchers, we trusted that the RTE microarray probes could accurately quantify RTE expression at class and family levels.

      Strengths: 

      This is a very important biological problem. 

      Weaknesses: 

      RNA microarray probes are obviously biased to genes, and thus quantifying transposon analysis based on them seems dubious. Based on how arrays are designed there should at least be partial (perhaps outdated evidence) that the probe sites overlap a protein-coding or non-coding RNA. 

      We disagree with the reviewer that quantifying transposon analysis based on microarray data is dubious. As previously shown by Reichmann et al., the quantification is reliable as long as the probes do not overlap with annotated genes and they are in the correct orientation to detect sense repetitive element transcripts. Reichman et al. identified 1,400 repetitive element probes in version 1.0, version 1.1 and version 2.0 of the Illumina Mouse WG-6 Beadchips by comparing the genomic locations of the probes with the Repeatmasked regions of the mouse genome. We applied the same criteria for Illumina Human HT-12 V3 (29431 probes) and V4 (33963) to identify the RTE-specific probes. 

      The authors state they only used intergenic probes, but based on supplementary files, almost half of RTE probes are not intergenic but intronic (n=106 out of 264). 

      All our identified RTE probes overlap with intergenic regions. However, due to their repetitive natures, some probes overlap with intronic regions, too. We have replaced "intergenic" with "non-coding" in our resubmission to show that they do not overlap with the exons of protein-coding genes. However, we do not rule out the possibility that some of our detected RTE probes might overlap non-coding RNAs. In fact, the border between coding and non-coding genomes has recently become very fuzzy with new annotations of the genome. RTE RNAs can be easily considered as non-coding RNAs if we challenge our traditional junk DNA view. 

      This is further complicated by the fact that not all this small subset of probes is available in all analyzed datasets. For example, 232 probes were used for the MESA dataset but only 80 for the GTP dataset. Thus, RTE expression is quantified with a set of probes which is extremely likely to be highly affected by non-RTE transcripts and that is also different across the studied datasets. Differences in the subsets of probes could very well explain the large differences between datasets in multiple of the analyses performed by the authors, such as in Figure 2a, or 3a. It is nonetheless possible that the quantification of RTE expression performed by the authors is truly interpretable as RTE expression, but this must be validated with more data from RNA-seq. Above all, microarray data should not be the main type of data used in the type of analysis performed by the authors. 

      In this study, we did not compare MESA with GTP etc. We have analysed each dataset separately based on the available data for that dataset. Therefore, sacrificing one analysis because of the lack of information from the other does not make sense. We would do that if we were after comparing different datasets. Moreover, the datasets are not comparable because they were collected from different types of blood samples. 

      Reviewer #2 (Public Review): 

      Summary: 

      Yi-Ting Tsai and colleagues conducted a systematic analysis of the correlation between the expression of retrotransposable elements (RTEs) and aging, using publicly available transcriptional and methylome microarray datasets of blood cells from large human cohorts, as well as single-cell transcriptomics. Although DNA hypomethylation was associated with chronological age across all RTE biotypes, the authors did not find a correlation between the levels of RTE expression and chronological age. However, expression levels of LINEs and LTRs positively correlated with DNA demethylation, and inflammatory and senescence gene signatures, indicative of "biological age". Gene set variation analysis showed that the inflammatory response is enriched in the samples expressing high levels of LINEs and LTRs. In summary, the study demonstrates that RTE expression correlates with "biological" rather than "chronological" aging. 

      Strengths: 

      The question the authors address is both relevant and important to the fields of aging and transposon biology. 

      We thank the reviewer for finding this study relevant and important.

      Weaknesses: 

      The choice of methodology does not fully support the primary claims. Although microarrays can detect certain intergenic transposon sequences, the authors themselves acknowledge in the Discussion section that this method's resolution is limited. More critical considerations, however, should be addressed when interpreting the results. The coverage of transposon sequences by microarrays is not only very limited (232 unique probes) but also predetermined. This implies that any potential age-related overexpression of RTEs located outside of the microarray-associated regions, or of polymorphic intact transposons, may go undetected. Therefore, the authors should be more careful while generalising their conclusions. 

      This is a bioinformatics study, and we have already admitted and discussed the limitations in the discussion section of this manuscript. All technologies have their own limitations, and this should not stop us from shedding light on scientific facts because of inadequate information. In the manuscript, we have discussed that all large and proper ageing studies were performed using microarray technology. Peters et al. (DOI: doi: 10.1038/ncomms9570) adopted all these datasets in their transcriptional landscape of ageing manuscript, which was used in previous studies of ageing as well. Our study essentially applies the Reichmann et al. method to the peripheral blood-related data from the Peters et al. manuscript. Since hypomethylation due to ageing is a well-established and broad epigenetic reprogramming, it is unlikely that only a fraction of RTEs is affected by this phenomenon. Therefore, the subsampling of RTEs should not affect the result so much. Indeed, this is supported in our study by the inverse correlation between DNA methylation and RTE expression for LINE and SINE classes despite having limited numbers of probes for LINE and SINE expressions.    

      Additionally, for some analyses, the authors pool signals from RTEs by class or family, despite the fact that these groups include subfamilies and members with very different properties and harmful potentials. For example, while sequences of older subfamilies might be passively expressed through readthrough transcription, intact members of younger groups could be autonomously reactivated and cause inflammation. The aggregation of signals by the largest group may obscure the potential reactivation of smaller subgroups. I recommend grouping by subfamily or, if not possible due to the low expression scores, by subgroup. For example, all HERV subfamilies are from the ERVL family. 

      We agree with the reviewer that different subfamilies of RTEs play different roles through their activation. However, we will lose our statistical power if we study RTE subfamilies with a few probes. Global epigenetic alteration and derepression of RTEs by ageing have been observed to be genome-wide. While our systematic analysis across RTE classes and families cannot capture alterations in subfamilies due to statistical power, it is still relevant to the research question we are addressing.

      Next, Illumina arrays might not accurately represent the true abundance of TEs due to nonspecific hybridization of genomic transposons. Standard RNA preparations always contain traces of abundant genomic SINEs unless DNA elimination is specifically thorough. The problem of such noise should be addressed. 

      We have checked the RNA isolation step from MESA, GTP, and GARP manuscripts. The total RNA was isolated using the Qiagen mini kit following the manufacturer’s recommendations. The authors of these manuscripts did not mention whether they eliminated genomics DNA, but we assumed they were aware of the DNA contamination and eliminated it based on the manufacturer’s recommendations. We have looked up the literature about nonspecific hybridization of RTEs but could not find any evidence to support this observation. We would appreciate the reviewers providing more evidence about such RTE contaminations.   

      Lastly, scRNAseq was conducted using 10x Genomics technology. However, quantifying transposons in 10x sequencing datasets presents major challenges due to sparse signals. 

      Applying the scTE pipeline (https://www.nature.com/articles/s41467-021-21808-x), we have found that the statical power of quantifying RTE classes (LINE, SINE, and LTR) or  RTE families (L1, L2, All, ERVK, etc.) are as good as each individual gene. However, our proposed method cannot analyse RTE subfamilies, and we did not do that. 

      Smart-seq single-cell technology is better suited to this particular purpose. 

      We agree with the reviewer that Smart-seq provides higher yield than 10x, but there is no Smartseq data available for ageing study.  

      Anyway, it would be more convincing if the authors demonstrated TE expression across different clusters of immune cells using standard scRNAseq UMAP plots instead of boxplots. 

      Since the number of RTE reads per cell is low, showing the expression of RTEs per cell in UMAP may not be the best statistical approach to show the difference between the aged and young groups. This is why we chose to analyse with Pseudobulk and displayed differential expression using boxplot rather than UMAP for each immune cell type. 

      I recommend validating the data by RNAseq, even on small cohorts. Given that the connection between RTE overexpression and inflammation has been previously established, the authors should consider better integrating their observations into the existing knowledge. 

      Please see below. We have analysed RNA-seq data suggested by Reviewer 1 in the Recommendations for the Authors section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I can recommend two sizeable human PMBC RNA-seq datasets that the authors could use:

      Marquez et al. 2020 (phs001934.v1.p1, controlled access) and Morandini et al. 2023 (GSE193141, public access). There are likely other suitable datasets that I am not aware of. I would also recommend using identical sets of probes to quantify RTE expression across studies. If certain datasets have too few probes and would thus limit the number of probes available across all studies it might be a good idea to exclude the dataset, especially if the analysis has been supplemented by the additional RNA-seq datasets. 

      Until recently, there was no publicly-available, non-cancerous, large cohort of RNA-seq data for ageing studies. We tried to gain access to the two RNA-seq datasets suggested by reviewer 2: Marquez et al. 2020 (phs001934.v1.p1, controlled access) and Morandini et al. 2023 (GSE193141, public access). 

      Unfortunately, Marquez et al. 2020 data is not accessible because the authors only provide the data for projects related to cardiovascular diseases. However, we did analyse Morandini et al. 2023 data, and we can confirm that no association was observed between any class and family of RTEs with chronological ageing (Author response image 1), which is the second strong piece of evidence supporting the statement in the manuscript. However, as expected, we found a positive correlation between RTE expression and IFN-I signature score (Author response image 2).

      Author response image 1.

      Linear analysis of RTE expression and chronological age.

      Author response image 2.

      Linear analysis of RTE expression and IFN gene signature expression.

      The authors use "biological age" and inflammation as interchangeable concepts, including in the title. Please correct this wording. 

      We have now added a new terminology to the manuscript called “biological age-related (BAR)”, which has been clearly addressed this distinction. We don’t think it is needed to change the title.  

      The authors find correlations between RTE expression and age-associated gene signatures but not chronological age itself. This is puzzling because, as the wording suggests, the expression of these inflammatory pathways is age-associated. If RTE expression correlates with inflammation which itself correlates with age, one might expect RTE expression to also correlate with age. Do the authors see a correlation between various inflammatory gene signatures and chronological age, in the analyzed datasets? If yes, then how would you explain that discrepancy? Moreover, in this case, I would recommend using a linear model, rather than correlation, to separate the effects of chronological age and RTE expression on inflammation (Inflammation et al ~ Age + RTE expression), or equivalent designs.

      As described above, we have now introduced the BAR terminology, which resolves this confusion. We did not find a correlation between RTE expression and chronological age. However, we did identify the correlation between BAR gene signatures and RTE expression.

      To separate the effects of chronological age and RTE expression on BAR gene signature scores, we performed a generalized linear model (GLM) analysis using BAR gene signature scores as response variables and RTE expression and chronological age as predictors (BAR gene signature scores ~ RTE expression + chronological age). Significant association was observed between BAR gene signature scores and RTE expression in the GARP cohort (Author response image 3). However, when chronological age is considered as predictor, we did not identify a correlation between chronological age and BAR gene signatures, indicating that BAR events are not corelated with chronological age (Author response image 3).  

      Author response image 3.

      Generalized linear models (GLM) analysis (BAR gene signature scores ~ RTE expression + chronological age). For each RTE family, we separately performed GLM. Age (RTE family) indicates the chronological age when used in the design formula for that specific RTE family. 

      Some of the gene sets used by the authors have considerable overlap with others and are also not particularly comprehensive. I can recommend this very comprehensive gene set: https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/SAUL_SEN_MAYO.  

      We did not choose to use large gene lists such as the suggested SEN_MAYO list, as we found Singscore struggles to generate reliable scores with sufficient variance when the number of genes increase to more than twenty. Although there is some overlap between inflammation-related genes and cellular senescence genes (e.g., IL6, IL1A, IL1B), it is important to note that each gene list focuses on different aspects of biological aging and should not be dismissed as redundant.

      Minor comments: 

      Overall, several sentences in the manuscript feel somewhat unnatural. I would recommend further proofreading. I will mention some examples:  

      Thank you for your feedback. We have fixed all these issues in the new submission.  

      • One line 34, "like the retroviruses" should be "like retroviruses. There are several other places in the text where "the" is not required. 

      Fixed.

      • On line 86, "to generate the RTE expression". "the" is again not necessary and I would replace "generate" with "quantify". 

      Fixed.

      • On line 86, "we mapped the probe locations to RepeatMasker". RepeatMasker is not a genome. Do you mean you mapped the probe location to a genome annotated by RepeatMasker? The same applies to line 99.  

      Fixed. We changed the sentence to: “To quantify RTE expression, we mapped the microarray probe locations to RTE locations in RepeatMasker to extract the list of noncoding (intergenic or intronic) probes that cover the RTE regions.”

      • Figure 1 contains a typo in the aims section: "evetns" instead of "events".  

      Fixed.

      • On line 495 "filtered out" seems to imply your removed intergenic probes. I assume you mean that you specifically selected intergenic probes. 

      Fixed.

      • Figure 1 nicely summarizes your datasets. Could you add a Figure 1b panel showing how you used RNA arrays to quantify RTE expression? This should include the number of probes for each RTE family, so I suggest merging this with Figure S1.  

      We disagree with the reviewer to merge Figure 1 and Figure S1 because they are addressing two different concepts.  

      Reviewer #2 (Recommendations For The Authors): 

      In Figure 2c, it is unclear what colour scale has been used for age. 

      Thank you for the comment. We have added a legend for age in this figure.

      There are no figure legends for Supplementary Figures 1 to 5 and all figures after Supplementary Figure 8. 

      A new version with legends has been submitted.

      For different datasets used, the choice of "healthy" patients should be more clear and explicit.

      Are asymptomatic patients with autoimmune inflammatory disorders considered as "healthy"? If not only healthy patients' blood is analysed (such as PBMS from primary osteoarthrosis), how inflammatory signatures enrichment discovered in this study may be associated not just with "biological age" but with the disease itself? 

      In our analysis, we did not exclusively study "healthy" individuals, as none of our datasets were initially collected from strictly healthy populations. While the microarray datasets were not specifically collected from people with particular diseases, they were also not screened for asymptomatic conditions. To demonstrate the same pattern in healthier cohorts, we added scRNA-seq analysis of confirmed healthy individuals to our study. However, the focus of this study is not on healthy aging. Instead, it is on biological ageing that includes both healthy and non-healthy ageing.

      We included the GARP (primary osteoarthritis) dataset as it is a cohort of age-related diseases (ARD). While we cannot definitively attribute inflammatory signatures enrichment to biological aging or disease, the observation of such enrichment in a cohort of ARD is worth considering. To make this clearer, we have replaced the term “healthy” with “non-cancerous” for microarray analysis throughout the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ever-improving techniques allow the detailed capture of brain morphology and function to the point where individual brain anatomy becomes an important factor. This study investigated detailed sulcal morphology in the parieto-occipital junction. Using cutting-edge methods, it provides important insights into local anatomy, individual variability, and local brain function. The presented work advances the field and will stimulate future research into this important area.

      Strengths:

      Detailed, very thorough methodology. Multiple raters mapped detailed sulci in a large cohort. The identified sulcal features and their functional and behavioural relevance are then studied using various complementary methods. The results provide compelling evidence for the importance of the described sulcal features and their proposed relationship to cortical brain function.

      We thank the Reviewer for highlighting the strengths of our methods and findings.

      Weaknesses:

      A detailed description/depiction of the various sulcal patterns is missing.

      We agree that adding these details for the newly described sulci is necessary and have now done so. These details are included in the Results (Page 6):

      “Beyond characterizing the incidence of sulci, it is also common in the neuroanatomical literature to qualitatively characterize sulci on the basis of fractionation and intersection with surrounding sulci (termed “sulcal types”; for examples in other cortical expanses, see Chiavaras & Petrides, 2000; Drudik et al., 2023; Miller et al., 2021; Paus et al., 1996; Weiner et al., 2014; Willbrand, Parker, et al., 2022). All four sulci most commonly did not intersect with other sulci (see Supplementary Tables 1-4 for a summary of the sulcal types of the slocs and pAngs dorsal and ventral components). The sulcal types were also highly comparable between hemispheres (rs > .99 , ps < .001).”

      And in four new Supplementary Tables.

      A possible relationship between sulcal morphology and individual demographics might provide more insight into anatomical variability.

      We have conducted additional analyses to relate sulcal incidence to demographic features (age and gender). These results are included on Pages 5-6:

      “Given that sulcal incidence and patterning is also sometimes related to demographic features (Cachia et al., 2021; Leonard et al., 2009; Wei et al., 2017), subsequent GLMs relating the incidence and patterning of the three more variable sulci (slocs-d, pAngs-v, and pAngs-d) to demographic features (age and gender) revealed no associations for any sulcus (ps > .05).”

      The unique dataset offers an opportunity to provide insights into laterality effects that should be explored.

      We included hemisphere as a factor in all models for this exact reason. Throughout the paper, we have edited the text to ensure that these laterality effects are more apparent to readers.

      Further, we have a Supplementary Results section on hemispheric effects regarding the slocs-v, cSTS3, and lTOS:

      “Hemispheric asymmetries in morphological, architectural, and functional features with regards to the slocs-v, cSTS3, and lTOS comparison

      We observed a sulcus x metric x hemisphere interaction on the morphological and architectural features of the slocs-v (F(4.20, 289.81) = 4.16, η2 = 0.01, p = .002; the cSTS3 is discussed in the next section). Post hoc tests showed that this interaction was driven by  the slocs-v being cortically thinner in the left than the right hemisphere (p < .001; Fig. 2a).

      There was also a sulcus x network x hemisphere interaction on the functional connectivity profiles (using functional connectivity parcellations from (Kong et al., 2019)) of the slocs-v and lTOS (F(32, 2144) = 3.99, η2 = 0.06, p < .001; the cSTS3 is discussed in the next section). Post hoc tests showed that this interaction was driven by three effects: (i) the slocs-v overlapped more with the Default C subnetwork in the left than the right hemisphere (p = .013), (ii) the lTOS overlapped more with Visual A subnetwork in the right than the left hemisphere (p = .002), and (iii) the lTOS overlapped more with the Visual B subnetwork in the left than the right hemisphere (p = .002; Fig. 2b).”

      As well as the other STS rami on morphology:

      “It is also worth noting that there was a sulcus x metric x hemisphere interaction (F(4, 284.12) = 6.60, η2 = 0.08, p < .001). Post hoc tests showed that: (i) the cSTS3 was smaller (p < .001) and thinner (p = .025) in the left than the right hemisphere (Supplementary Fig. 8a), (ii) the cSTS2 was shallower (p = .004) and thicker (p < .001) in the right than left hemisphere (Supplementary Fig. 8a), and (iii) the cSTS1 was shallower (p < .001), smaller (p = .002), thinner (p = .001), and less myelinated (p < .001) in the left than the right hemisphere (Supplementary Fig. 8a).”

      And functional connectivity of the STS rami:

      “There was also a sulcus x network x hemisphere interaction (F(32, 2208) = 12.26, η2 = 0.15, p < .001). Post hoc tests showed differences for each cSTS component. Here, the cSTS1 overlapped more with the Auditory network (p < .001), less with the Control B subnetwork (p < .001), more with the Control C subnetwork (p < .001), less with the Default B subnetwork (p < .001), more with the Default C subnetwork (p < .001), more with the Ventral Attention B subnetwork (p < .001), and more with the Visual A subnetwork (p = .024) in the right than in the left hemisphere (Supplementary Fig. 8b). In addition, the cSTS2 overlapped more with the Control B subnetwork (p < .001), more with the Control C subnetwork (p < .001), less with the Default B subnetwork (p < .001), and less with the Temporal-Parietal network (p = .011) in the right than in the left hemisphere (Supplementary Fig. 8b). Finally, the cSTS3 overlapped more with the Control B subnetwork (p = .002), less with the Default B subnetwork (p = .014), more with the Default C subnetwork (p = .022), less with the Ventral Attention B subnetwork (p = .029) in the right than in the left hemisphere (Supplementary Fig. 8b).”

      Reviewer #2 (Public Review):

      Summary: After manually labeling 144 human adult hemispheres in the lateral parieto-occipital junction (LPOJ), the authors 1) propose a nomenclature for 4 previously unnamed highly variable sulci located between the temporal and parietal or occipital lobes, 2) focus on one of these newly named sulci, namely the ventral supralateral occipital sulcus (slocs-v) and compare it to neighboring sulci to demonstrate its specificity (in terms of depth, surface area, gray matter thickness, myelination, and connectivity), 3) relate the morphology of a subgroup of sulci from the region including the slocs-v to the performance in a spatial orientation task, demonstrating behavioral and morphological specificity. In addition to these results, the authors propose an extended reflection on the relationship between these newly named landmarks and previous anatomical studies, a reflection about the slocs-v related to functional and cytoarchitectonic parcellations as well as anatomic connectivity and an insight about potential anatomical mechanisms relating sulcation and behavior.

      Strengths:

      - To my knowledge, this is the first study addressing the variable tertiary sulci located between the superior temporal sulcus (STS) and intraparietal sulcus (IPS).

      - This is a very comprehensive study addressing altogether anatomical, architectural, functional and cognitive aspects.

      - The definition of highly variable yet highly reproducible sulci such as the slocs-v feeds the community with new anatomo-functional landmarks (which is emphasized by the provision of a probability map in supp. mat., which in my opinion should be proposed in the main body).

      - The comparison of different features between the slocs-v and similar sulci is useful to demonstrate their difference.

      - The detailed comparison of the present study with state of the art contextualizes and strengthens the novel findings.

      - The functional study complements the anatomical description and points towards cognitive specificity related to a subset of sulci from the LPOJ

      - The discussion offers a proposition of theoretical interpretation of the findings

      - The data and code are mostly available online (raw data made available upon request).

      We thank the Reviewer for highlighting the strengths of our methods, analyses, and applications of our findings.

      Weaknesses:

      - While three independent raters labeled all hemispheres, one single expert finalized the decision. Because no information is reported on the inter-rater variability, this somehow equates to a single expert labeling the whole cohort, which could result in biased labellings and therefore affect the reproducibility of the new labels.

      Our group does not use an approach amenable to calculating inter-rater agreements to expedite the process of defining thousands of sulci at the individual level in multiple regions. Our method consists of a two-tiered procedure. Here, authors YT and TG defined sulci which were then checked by a trained expert (EHW). These were then checked again by senior author  (KSW) . We emphasize that this process has produced reproducible anatomical results in other regions such as posteromedial cortex (Willbrand et al., 2023 Science Advances; Willbrand et al., 2023 Communications Biology; Maboudian et al., 2024 The Journal of Neuroscience), ventral temporal cortex (Weiner et al., 2014 NeuroImage; Miller et al., 2020 Scientific Reports; Parker et al., 2023 Brain Structure and Function), and lateral prefrontal cortex (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Brain Structure and Function; Willbrand et al., 2023 The Journal of Neuroscience) across age groups, species, and clinical populations. Further, in the Supplemental Materials we provide post mortem images showing that these sulci exist outside of cortical reconstructions, supporting this updated sulcal schematic of the lateral parieto-occipital junction. For the present study, by the time the final tier of our method was reached, we emphasize that a very small percentage (~2%) of sulcal definitions were actually modified. We will include an exact percentage in future publications in LPC/LOPJ.

      - 3 out of the 4 newly labeled sulci are only described in the very first part and never reused. This should be emphasized as it is far from obvious at first glance of the article.

      We have edited the Abstract (shown below, on Page 1) and paper throughout to emphasize the emphasis on the slocs-v over the other three sulci.

      “After defining thousands of sulci in a young adult cohort, we revised the previous LPC/LPOJ sulcal landscape to include four previously overlooked, small, shallow, and variable sulci. One of these sulci (ventral supralateral occipital sulcus, slocs-v) is present in nearly every hemisphere and is morphologically, architecturally, and functionally dissociable from neighboring sulci. A data-driven, model-based approach, relating sulcal depth to behavior further revealed that the morphology of only a subset of LPC/LPOJ sulci, including the slocs-v, is related to performance on a spatial orientation task.”

      It is worth noting that we have added additional analyses that include the other three newly-characterized sulci in response to Reviewer 1. We first described the relationship between these sulci and demographic features, alongside analyses on the patterning of these sulci, which are included in the Results (Page 6):

      “Beyond characterizing the incidence of sulci, it is also common in the neuroanatomical literature to qualitatively characterize sulci on the basis of fractionation and intersection with surrounding sulci (termed “sulcal types”; for examples in other cortical expanses, see Chiavaras & Petrides, 2000; Drudik et al., 2023; Miller et al., 2021; Paus et al., 1996; Weiner et al., 2014; Willbrand, Parker, et al., 2022). All four sulci most commonly did not intersect with other sulci (see Supplementary Tables 1-4 for a summary of the sulcal types of the slocs and pAngs dorsal and ventral components). The sulcal types were also highly comparable between hemispheres (rs > .99 , ps < .001). Though we characterize these sulci in this paper for the first time, the location of these four sulci is consistent with the presence of variable “accessory sulci” in this cortical expanse mentioned in prior modern and classic studies (Supplementary Methods). We could also identify these sulci in post-mortem hemispheres (Supplementary Figs. 2, 3), ensuring that these sulci were not an artifact of the cortical reconstruction process.

      Given that sulcal incidence and patterning is also sometimes related to demographic features (Cachia et al., 2021; Leonard et al., 2009; Wei et al., 2017), subsequent GLMs relating the incidence and patterning of the three more variable sulci (slocs-d, pAngs-v, and pAngs-d) to demographic features (age and gender) revealed no associations for any sulcus (ps > .05).  Finally, to help guide future research on these newly- and previously-classified LPC/LPOJ sulci, we generated probabilistic maps of each of these 17 sulci and share them with the field with the publication of this paper (Supplementary Fig. 6; Data availability).”

      - The tone of the article suggests a discovery of these 4 sulci when some of them have already been reported (as rightfully highlighted in the article), though not named nor studied specifically. This is slightly misleading as I interpret the first part of the article as a proposition of nomenclature rather than a discovery of sulci.

      We have toned down our language throughout the paper, emphasizing that this paper is updating the sulcal landscape of LPC/LOPJ taking into account these sulci that have not been comprehensively described previously. For example, in the Abstract (Page 1), we now write:

      “After defining thousands of sulci in a young adult cohort, we revised the previous LPC/LPOJ sulcal landscape to include four previously overlooked, small, shallow, and variable sulci. One of these sulci (ventral supralateral occipital sulcus, slocs-v) is present in nearly every hemisphere and is morphologically, architecturally, and functionally dissociable from neighboring sulci. A data-driven, model-based approach, relating sulcal depth to behavior further revealed that the morphology of only a subset of LPC/LPOJ sulci, including the slocs-v, is related to performance on a spatial orientation task. “

      - The article never mentions the concept of merging of sulcal elements and the potential effect it could have on the labeling of the newly named variable sulci.

      We emphasize that we use multiple surfaces (pial, inflated, smoothwm) to help distinguish intersecting sulci from one another. We include extra text in the Methods (Page 21):

      “We defined LPC/LPOJ sulci for each participant based on the most recent schematics of sulcal patterning by Petrides (2019) as well as pial, inflated, and smoothed white matter (smoothwm) FreeSurfer cortical surface reconstructions of each individual. In some cases, the precise start or end point of a sulcus can be difficult to determine on a surface (Borne et al., 2020); however, examining consensus across multiple surfaces allowed us to clearly determine each sulcal boundary in each individual. “

      Further, upon quantifying the patterning of these variable sulci, a majority of the time they are independent (described in the Results on Page 6):

      “Beyond characterizing the incidence of sulci, it is also common in the neuroanatomical literature to qualitatively characterize sulci on the basis of fractionation and intersection with surrounding sulci (termed “sulcal types”; for examples in other cortical expanses, see (Chiavaras & Petrides, 2000; Drudik et al., 2023; Miller et al., 2021; Paus et al., 1996; Weiner et al., 2014; Willbrand, Parker, et al., 2022). All four sulci most commonly did not intersect with other sulci (see Supplementary Tables 1-4 for a summary of the sulcal types of the slocs and pAngs dorsal and ventral components). The sulcal types were also highly comparable between hemispheres (rs > .99 , ps < .001).”

      Thus, merging sulcal elements likely had a minimal impact on the present definitions.

      - The definition of the new sulci is solely based on their localization relative to other sulci which are themselves variable (e.g. the 3rd branch of the STS can show different locations and different orientation, potentially affecting the definition of the slocs-v). This is not addressed in the discussion.

      As displayed in our probabilistic maps of these sulci (Supplementary Fig. 6), the cSTS components (2-4) are actually relatively consistent between individuals, and thus, future investigators can utilize these maps to help define these sulci in new hemispheres.

      Nevertheless, there is, of course, individual variability in the location of these sulci, and we do agree that this point brought up by the Reviewer is important. We have now added text to the Limitations section of the Discussion (Pages 15-16):

      “The main limitation of our study is that presently, the most accurate methodology to define sulci —especially the small, shallow, and variable PTS—requires researchers to manually trace each structure on the cortical surface reconstructions. This method is limited due to the individual variability of cortical sulcal patterning (Fig. 1, Supplementary Fig. 5), which makes it challenging to identify sulci, let alone PTS, without extensive experience and practice. However, we anticipate that our probabilistic maps  will provide a starting point and hopefully, expedite the identification of these sulci in new participants. This method is also arduous and time-consuming—which, on the one hand, limits the sample size in terms of number of participants, while on the other, results in thousands of precisely defined sulci. This push-pull  relationship reflects a broader conversation in the human brain mapping and cognitive neuroscience fields between a balance of large N studies and “precision imaging” studies in individual participants (Allen et al., 2022; Gratton et al., 2022; Naselaris et al., 2021; Rosenberg and Finn, 2022).”

      - The new sulci are only defined in terms of localization relative to other sulci, and no other property is described (general length, depth, orientation, shape...), making it hard for a new observer to take labeling decisions in case of conflict.

      To help guide future investigators, we now show these metrics for all sulci in Supplemental Figure 7 to help future groups identify these sulci with the assistance of their general morphology.

      - The very assertive tone of the article conveys the idea that these sulci are identifiable certainly in most cases, when by definition these highly variable tertiary sulci are sometimes very difficult to take decisions on.

      The highly variable nature of ¾ of the putative tertiary sulci (slocs-v, slocs-d, pAngs-v, pAngs-d) described here is why we focused on the slocs-v (as it is identifiable in nearly all f hemispheres). However, we have edited our language throughout the text to also emphasize the variability of these sulci. For example, in the Results (Page 5), we now write:

      “In previous research in small sample sizes, neuroanatomists noticed shallow sulci in this cortical expanse (Supplementary Methods and Supplementary Figs. 1-4 for historical details). In the present study, we fully update this sulcal landscape considering these overlooked indentations. In addition to defining the 13 sulci previously described within the LPC/LPOJ, as well as the posterior superior temporal cortex (Methods) (Petrides, 2019) in individual participants, we could also identify as many as four small and shallow PTS situated within the LPC/LPOJ that were highly variable across individuals and uncharted until now (Supplementary Methods and Supplementary Figs. 1-4). Macroanatomically, we could identify two sulci between the cSTS3 and the IPS-PO/lTOS ventrally and two sulci between the cSTS2 and the pips/IPS dorsally. We focus our analyses on the slocs-v since it was identifiable in nearly every hemisphere.”

      - I am not absolutely convinced with the labeling proposed of a previously reported sulcus, namely the posterior intermediate parietal sulcus.

      In defining previously-identified LPC sulci, we followed the previous labeling procedure by Petrides (2019) alongside historical definitions (detailed in Supplementary Figures 1-4). Nevertheless, future deep learning algorithms using these and others data can be used to rectify discrepancies in labeling (e.g., Borne et al., 2020 Medical Image Analysis; Lyu et al., 2021 NeuroImage). We discuss these points in the Limitations section of the Discussion (Pages 16-17):

      “The main limitation of our study is that presently, the most accurate methodology to define sulci —especially the small, shallow, and variable PTS—requires researchers to manually trace each structure on the cortical surface reconstructions. This method is limited due to the individual variability of cortical sulcal patterning (Fig. 1, Supplementary Fig. 5), which makes it challenging to identify sulci without extensive experience and practice. However, we anticipate that our probabilistic maps  will provide a starting point and hopefully, expedite the identification of these sulci in new participants. This should accelerate the process of subsequent studies confirming the accuracy of our updated schematic of LPC/LOPJ. This manual method is also arduous and time-consuming, which, on the one hand, limits the sample size in terms of number of participants, while on the other, results in thousands of precisely defined sulci. This push-pull relationship reflects a broader conversation in the human brain mapping and cognitive neuroscience fields between a balance of large N studies and “precision imaging” studies in individual participants (Allen et al., 2022; Gratton et al., 2022; Naselaris et al., 2021; Rosenberg & Finn, 2022). Though our sample size is comparable to other studies that produced reliable results relating sulcal morphology to brain function and cognition (e.g., (Cachia et al., 2021; Garrison et al., 2015; Lopez-Persem et al., 2019; Miller et al., 2021; Roell et al., 2021; Voorhies et al., 2021; Weiner, 2019; Willbrand, Parker, et al., 2022; Willbrand, Voorhies, et al., 2022; Yao et al., 2022), ongoing work that uses deep learning algorithms to automatically define sulci should result in much larger sample sizes in future studies (Borne et al., 2020; Lyu et al., 2021). Finally, the time-consuming manual definitions of primary, secondary, and PTS also limit the cortical expanse explored in each study, thus, restricting the present study to LPC/LPOJ. “

      Assuming that the labelling of all sulci reported in the article is reproducible, the different results are convincing and in general, this study achieves its aims in defining more precisely the sulcation of the LPOJ and looking into its functional/cognitive value. This work clearly offers a finer understanding of sulcal pattern in this region, and lacks only little for the new markers to be convincingly demonstrated. An overall coherence of the labelling can still be inferred from the supplementary material which support the results and therefore the conclusions, yet, addressing some of the weaknesses listed above would greatly enhance the impact of this work. This work is important to the understanding of sulcal variability and its implications on functional and cognitive aspects.

      We thank the Reviewer for their positive remarks on the implications of this work.

      Reviewer #3 (Public Review):

      Summary: 72 subjects, and 144 hemispheres, from the Human Connectome Project had their parietal sulci manually traced. This identified the presence of previously undescribed shallow sulci. One of these sulci, the ventral supralateral occipital sulcus (slocs-v), was then demonstrated to have functional specificity in spatial orientation. The discussion furthermore provides an eloquent overview of our understanding of the anatomy of the parietal cortex, situating their new work into the broader field. Finally, this paper stimulates further debate about the relative value of detailed manual anatomy, inherently limited in participant numbers and areas of the brain covered, against fully automated processing that can cover thousands of participants but easily misses the kinds of anatomical details described here.

      Strengths:

      - This is the first paper describing the tertiary sulci of the parietal cortex with this level of detail, identifying novel shallow sulci and mapping them to behaviour and function.

      - It is a very elegantly written paper, situating the current work into the broader field.

      - The combination of detailed anatomy and function and behaviour is superb.

      We thank the Reviewer for their positive remarks on paper and our findings.

      Weaknesses:

      - The numbers of subjects are inherently limited both in number as well as in typically developing young adults.

      We emphasize that the sample size is limited due to the arduous nature of manually defining sulci; however, we provide probabilistic maps with the publication of this work to help expedite this process for future investigators. Further, with improved deep learning algorithms, the sample sizes in future neuroanatomical studies should be enhanced. We discuss these points in the Limitations section of the Discussion (Pages 16-17):

      “The main limitation of our study is that presently, the most accurate methodology to define sulci —especially the small, shallow, and variable PTS—requires researchers to manually trace each structure on the cortical surface reconstructions. This method is limited due to the individual variability of cortical sulcal patterning (Fig. 1, Supplementary Fig. 5), which makes it challenging to identify sulci without extensive experience and practice. However, we anticipate that our probabilistic maps  will provide a starting point and hopefully, expedite the identification of these sulci in new participants. This should accelerate the process of subsequent studies confirming the accuracy of our updated schematic of LPC/LOPJ. This manual method is also arduous and time-consuming, which, on the one hand, limits the sample size in terms of number of participants, while on the other, results in thousands of precisely defined sulci. This push-pull relationship reflects a broader conversation in the human brain mapping and cognitive neuroscience fields between a balance of large N studies and “precision imaging” studies in individual participants (Allen et al., 2022; Gratton et al., 2022; Naselaris et al., 2021; Rosenberg & Finn, 2022). Though our sample size is comparable to other studies that produced reliable results relating sulcal morphology to brain function and cognition (e.g., (Cachia et al., 2021; Garrison et al., 2015; Lopez-Persem et al., 2019; Miller et al., 2021; Roell et al., 2021; Voorhies et al., 2021; Weiner, 2019; Willbrand, Parker, et al., 2022; Willbrand, Voorhies, et al., 2022; Yao et al., 2022), ongoing work that uses deep learning algorithms to automatically define sulci should result in much larger sample sizes in future studies (Borne et al., 2020; Lyu et al., 2021). The time-consuming manual definitions of primary, secondary, and PTS also limit the cortical expanse explored in each study, thus restricting the present study to LPC/LPOJ.”

      - While the paper begins by describing four new sulci, only one is explored further in greater detail.

      Due to the increased variability of three of the four newly-classified sulci, we chose to only focus on the slocs-v given that it was present in nearly all hemispheres. In response to other reviewers, we have conducted additional analyses that also describe these new sulci and potential factors related to their incidence (Page 6):

      “Given that sulcal incidence and patterning is also sometimes related to demographic features (Cachia et al., 2021; Leonard et al., 2009; Wei et al., 2017), subsequent GLMs relating the incidence and patterning of the three more variable sulci (slocs-d, pAngs-v, and pAngs-d) to demographic features (age and gender) revealed no associations for any sulcus (ps > .05).”

      In addition, given that sulcal variability is cognitively (e.g., Amiez et al., 2018 Scientific Reports; Cachia et al., 2021 Frontiers in Neuroanatomy; Garrison et al., 2015 Nature Communications; Willbrand et al., 2022, 2023 Brain Structure & Function), anatomically (e.g., Amiez et al., 2021 Communications Biology; Vogt et al., 1995 Journal of Comparative Neurology), functionally (e.g., Lopez Persem et al., 2019 The Journal of Neuroscience), and translationally (e.g., Yucel et al., 2002 Biological Psychiatry) relevant, future research can investigate these relationships regarding the slocs-d and pAngs components. We have added text to the Limitations section of the Discussion (Pages 17-18) to discuss this:

      “Finally, although we did not focus on the relationship between the other three PTS (slocs-d, pAngs-v, and pAngs-d) to anatomical and functional features of LPC and cognition, given that variability in sulcal incidence is cognitively (Amiez et al., 2018; Cachia et al., 2021; Garrison et al., 2015; Willbrand, Jackson, et al., 2023; Willbrand, Voorhies, et al., 2022), anatomically (Amiez et al., 2021; Vogt et al., 1995), functionally (Lopez-Persem et al., 2019), and translationally (Clark et al., 2010; Le Provost et al., 2003; Meredith et al., 2012; Nakamura et al., 2020; Yücel et al., 2002, 2003) relevant, future work can also examine the relationship between the more variable slocs-d, pAngs-v, and pAngs-d and these features.”

      - There is some tension between calling the discovered sulci new vs acknowledging they have already been reported, but not named.

      We have edited the manuscript throughout to emphasize our primary focus on revising the LPC/LOPJ sulcal landscape to include these often overlooked small, shallow, and variable putative tertiary sulci, rather than using the terms “discovered sulci” and “new.”

      - The anatomy of the sulci, as opposed to their relation to other sulci, could be described in greater detail.

      Beyond the radar plots in the main text which compare specific groupings of sulci, we now show the morphological metrics for all sulci investigated in the present work in Supplemental Figure 7.

      Overall, to summarize, I greatly enjoyed this paper and believe it to be a highly valued contribution to the field.

      We are glad the Reviewer enjoyed reading our paper and thank them for their positive thoughts on the potential impact of this work on the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The slocs-v is found in 71 subjects left and right. Is that the same subject?

      No, these are different subjects.

      (2) How were the 72 subjects chosen?

      The subjects were randomly selected from the HCP database as describe in the methods (Page 18):

      “Here, we used 72 randomly-selected participants, balanced for gender (following the terminology of the HCP data dictionary), from the HCP database (50% female, 22-36 years old, and 90% right-handed; there was no effect of handedness on our behavioral tasks; Supplementary materials) that were also analyzed in several prior studies (Hathaway et al., 2023; Miller et al., 2021, 2020; Willbrand et al., 2023b, 2023c, 2022a).”

      (3) Are there effects of laterality on sulcal pattern? Table?

      We now include sulcal pattern results in the Results section and Supplementary Materials; although there were no laterality effects regarding the sulcal pattern .

      (4) Depiction/description of common sulcal patterns

      We now include sulcal pattern results in the Results section and Supplementary Materials.

      (5) Is there a relationship between sulcal patterns and demographic features?

      We now include analyses on this in the Results section. There is no relationship between sulcal patterns and demographic features.

      (6) Just for clarity, the sulcal features are studied and extracted in native space?

      Yes, sulcal features are studied and extracted in native space, as described in the Methods section (Page 19):

      “Anatomical T1-weighted (T1-w) MRI scans (0.8 mm voxel resolution) were obtained in native space from the HCP database. Reconstructions of the cortical surfaces of each participant were generated using FreeSurfer (v6.0.0), a software package used for processing and analyzing human brain MRI images (surfer.nmr.mgh.harvard.edu) (Dale et al., 1999; Fischl et al., 1999). All subsequent sulcal labeling and extraction of anatomical metrics were calculated from these native space reconstructions generated through the HCP’s version of the FreeSurfer pipeline (Glasser et al., 2013).”

      (7) The authors use "Gender". Are they referring to biological sex (female/male) or socially defined characteristics (man/woman etc.)?

      The term gender is referred to socially defined characteristics, as used by the HCP data dictionary (Methods page 18):

      “Here, we used 72 randomly-selected participants, balanced for gender (following the terminology of the HCP data dictionary), from the HCP database (50% female, 22-36 years old, and 90% right-handed; there was no effect of handedness on our behavioral tasks; Supplementary materials) that were also analyzed in several prior studies (Hathaway et al., 2023; Miller et al., 2021, 2020; Willbrand et al., 2023b, 2023c, 2022a).”

      (8) Fig 2. Grey is poorly visible compared to green and blue.

      The shade of gray has been edited to be more distinguishable.

      (9) The relationship between behavior and sulcal features is significant but weak.

      We acknowledge that the morphological-behavioral relationship identified in the present study explains a modest amount of variance; however, the more important aspect of the finding is that multiple sulci identified in the model are recently-characterized sulci in LPC/LOPJ identified by our group and others (Petrides, 2019), and thus, the relationship would have been overlooked or lost if these sulci were not identified. We have added text to the Limitations section of the Discussion (Pages 17-18) to emphasize this point:

      “It is also worth noting that the morphological-behavioral relationship identified in the present study explains a modest  amount of variance; however, the more important aspect of our findings is that multiple sulci identified in our model-based approach are recently-characterized sulci in LPC/LOPJ identified by our group and others (Petrides, 2019), and thus, the relationship would have been overlooked or lost if these sulci were not identified. “

      (10) The Limitation section could be expanded.

      We have added additional text to flesh out the Limitations section of the Discussion (Pages 17-18):

      “It is also worth noting that the morphological-behavioral relationship identified in the present study explains a modest  amount of variance; however, the more important aspect of our findings is that multiple sulci identified in our model-based approach are recently-characterized sulci in LPC/LOPJ identified by our group and others (Petrides, 2019), and thus, the relationship would have been overlooked or lost if these sulci were not identified. Finally, although we did not focus on the relationship between the other three PTS (slocs-d, pAngs-v, and pAngs-d) to anatomical and functional features of LPC and cognition, given that variability in sulcal incidence is cognitively (Amiez et al., 2018; Cachia et al., 2021; Garrison et al., 2015; Willbrand, Jackson, et al., 2023; Willbrand, Voorhies, et al., 2022), anatomically (Amiez et al., 2021; Vogt et al., 1995), functionally (Lopez-Persem et al., 2019), and translationally (Clark et al., 2010; Le Provost et al., 2003; Meredith et al., 2012; Nakamura et al., 2020; Yücel et al., 2002, 2003) relevant, future work can also examine the relationship between the more variable slocs-d, pAngs-v, and pAngs-d and these features. “

      Reviewer #2 (Recommendations For The Authors):

      First, I would like to thank the authors for their important contribution to the field of sulcal studies and anatomo-functional correlates. My main comments about the work are treated in the public review, and I will only address details in this section. I have detected a number of typos which are harder to report from a document in which lines are not numbered. Could you please submit a numbered document for the next iteration?

      - p2. "hominoid-specific, shallow indentations, or sulci" - can lead to misunderstanding that sulci are hominoid-specific and shallow

      Sentence has been rewritten:

      “Of all the neuroanatomical features to target, recent work shows that morphological features of the shallower, later developing, hominoid-specific indentations of the cerebral cortex (also known as putative tertiary sulci, PTS) are not only functionally and cognitively meaningful, but also are particularly impacted by multiple brain-related disorders and aging (Amiez et al., 2019, 2018; Ammons et al., 2021; Cachia et al., 2021; Fornito et al., 2004; Garrison et al., 2015; Harper et al., 2022; Hathaway et al., 2023; Lopez-Persem et al., 2019; Miller et al., 2021, 2020; Nakamura et al., 2020; Parker et al., 2023; Voorhies et al., 2021; Weiner, 2019; Willbrand et al., 2023b, 2023c, 2022a, 2022b; Yao et al., 2022).”

      - p2. next sentence (starting with "The combination [...]": not clear that you are addressing tertiary sulci here, maybe introduce the concept beforehand?

      The previous sentence (just above) has been edited to introduce putative tertiary sulci beforehand.

      - p5. error in numbering of sulci relative to Fig1. (5,6,7,8 -> 6,7,8,9)

      Sulcal numbering has been fixed.

      -p5. reference to supp mat -> I would have expected the nomenclature used in Borne et al. 2020 to be discussed alongside with the state of the art. How would you relate F.I.P.r.int.1 and F.I.P.r.int.2 to the sulci you describe?

      We thank the Reviewer for bringing up this relevant literature. The F.I.P.r.int. 1 and 2 are described as rami of the IPS, whereas the slocs and pAngs are independent, small indentations near the IPS, but not part of the complex. Nevertheless, future work should integrate these two schematics together to establish the most comprehensive sulcal map of LPC/LOPJ. We have added text to the Supplementary Methods detailing the differences between the F.I.P.r.int.1 and F.I.P.r.int.2 and slocs-/pAngs:

      “slocs/pAng vs. F.I.P.r.int.1 and F.I.P.r.int.2

      Recent work (Borne et al., 2020; Perrot et al., 2011) identified two intermediate rami of the IPS (F.I.P.r.int.1 and F.I.P.r.int.2) that were not defined in the present investigation. Crucially, the newly classified sulci here (slocs and pAngs) are distinguishable from the two F.I.P.r.int. in that the F.I.P.r.int. are branches coming off the main body of the IPS (Borne et al., 2020; Perrot et al., 2011), whereas the slocs/pAngs are predominantly non-intersecting (“free”) structures that never intersected with the IPS (Supplementary Tables 1-4).”

      - p6. Fig 1.a. labelling discrepancy between line 1 and 2, column 4: the labels 10 and 11 from the inflated hemisphere do not match the labels 10 and 11 in the pial surface. Fig 1.b. swapped label 2 and 3 in the 4th hemisphere

      These aspects of Figure 1 have been edited accordingly.

      - p7. "(iii) the slocs-v was thicker than both the cSTS3 and lTOS" -> the slocs-v showed thicker gray matter?

      The sentence has been adjusted (Page 7):

      “(iii) the slocs-v showed thicker gray matter than both the cSTS3 and lTOS (ps < .001), “

      - p9. Six left hemisphere LPC/LPOJ sulci were related to spatial orientation task performance -> missing

      Fixed (Page 9):

      “Six left hemisphere LPC/LPOJ sulci were related to spatial orientation task performance (Fig. 3a, b). “

      - p14. "Steel and colleagues" -> missing space

      Fixed (Page 14):

      “Furthermore, the slocs-v appears to lie at the junction of scene-perception and place-memory activity (a transition that also consistently co-localizes with the HCP-MMP area PGp) as identified by Steel and colleagues (2021).”

      - p20. Probability maps "we share these maps with the field" -> specify link to data availability

      The link to data availability has been added (Page 21):

      “To aid future studies interested in investigating LPC/LPOJ sulci, we share these maps with the field (Data availability). “

      Reviewer #3 (Recommendations For The Authors):

      No detailed recommendations not already present in the rest of the review.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      In this paper, Kalidindi and Crevecoeur ask why sequential movements are sometimes coarticulated. To answer this question, first, they modified a standard optimal controller to perform consecutive reaches to two targets (T1 and T2). They investigated the optimal solution with and without a constraint on the endpoint's velocity in the via target (T1). They observed that the controller coarticulates the movements only when there is no constraint on the speed at the via-point. They characterized coarticulation in two ways: First, T2 affected the curvature of the first reach in unperturbed reaches. Second, T2 affected corrective movements in response to a mechanical perturbation of the first reach. 

      Parallel to the modeling work, they ran the same experiment on human participants. The participants were instructed to either consider T1 as via point (go task) or to slow down in T1 and then continue to T2 (stop task). Mirroring the simulation results, they observed coarticulation only in the go task. Interestingly, in the go task, when the initial reach was occasionally perturbed, the long-latency feedback responses differed for different T2 targets, suggesting that the information about the final target was already present in the motor circuits that mediate the long-latency response. In summary, they conclude that coarticulation in sequential tasks depends on instruction, and when coarticulation happens, the corrections in earlier segments of movement reflect the entirety of the coarticulated sequence.

      Evaluation 

      Among many strengths of this paper, most notably, the results and the experiment design are grounded in, and guided by the optimal control simulation. The methods and procedures are appropriate and standard. The results and methods are explained sufficiently and the paper is written clearly. The results on modulation of long-latency response based on future goals are interesting and of broad interest for future experiments on motor control in sequential movement. However, I find the authors' framing of these results, mostly in the introduction section, somewhat complicated.

      The current version of the introduction motivates the study by suggesting that "coarticulation and separation of sub-movement [in sequential movements] have been formulated as distinct hypotheses" and this apparent distinction, which led to contradictory results, can be resolved by Optimal Feedback Control (OFC) framework in which task-optimized control gains control coarticulation. This framing seems complicated for two main reasons. First, the authors use chunking and coarticulation interchangeably. However, as originally proposed by (Miller 1956), the chunking of the sequence items may fully occur at an abstract level like working memory, with no motoric coarticulation of sequence elements at the level of motor execution. In this scenario, sequence production will be faster due to the proactive preparation of sequence elements. This simple dissociation between chunking and coarticulation may already explain the apparent contradiction between the previous works mentioned in the introduction section. Second, the authors propose the OFC as a novel approach for studying neural correlates of sequence production. While I agree that OFC simulations can be highly insightful as a normative model for understanding the importance of sequence elements, it is unclear to me how OFCs can generate new hypotheses regarding the neural implementation of sequential movements. For instance, if the control gains are summarizing the instruction of the task and the relevance of future targets, it is unclear in which brain areas, or how these control gains are implemented. I believe the manuscript will benefit from making points more clear in the introduction and the discussion sections. 

      We agree that chunking may occur at different levels that do not necessarily involve motor coarticulation. We clarified that our contribution is towards answering why sequence movements sometimes coarticulate, and how the way sequences are executed influences the representation of future goals in the sensorimotor system.

      To address this point, we made the following modifications in the introduction:

      Line 44:

      “It remains unclear how future goals are integrated in the sensorimotor system. For rapid execution of a sequence, one possible solution is to represent multiple goals within low-level control circuits (3, 16), enabling the execution of several elements as a single entity, called “motor chunk”. Note that chunking can also occur at a higher level such as in working memory-guided sequences, which in this case may or may not involve the production of a movement (17, 18).”

      Lines 50:

      “Recent neural recordings in the primary motor cortex (M1) have shown no specific influence of future goals on the population responses governing ongoing action (19, 20). Specifically, Zimnik and Churchland (20) observed in a two-reach sequence task that, there was no coarticulation in sub-movement kinematics although the execution got faster with practice. Notably, M1 displayed separate phases of execution related activity for each sub-movement. Using a neural network model, they interpreted that sequence goals could be separated and serially specified to the controller from regions upstream of M1 (Figure 1A). These findings contrast with earlier studies showing coarticulation of sub-movements and whole sequence representations in M1 (21–23). As a result, it has been suggested that coarticulation and separation in rapid sequences may involve distinct computations: coarticulation possibly involves replacing sub-movements with a motor chunk, while separation possibly indicates independent control of each sub-movement with chunking at a higher-level (4, 20).  Thus, there are unresolved questions regarding why sequential movements sometimes coarticulate, and how the representation of future goals in the sensorimotor system influences the way sequences are executed.”

      With respect to the second part of your concern about OFC, we agree that this framework does not make direct prediction about the neural implementation and our statements required clarifications. The first link between the model and prediction about neural data follows from the observation that long-latency circuits participate in task-dependent sequence production, thus indicating that transcortical pathways must express this task dependency. The second link between our work and neural activities is by providing a counter argument to previous interpretation: indeed, Zimnik and Churchland argued that independent or “holistic” sequence production should be associated with different representations in monkey’s brain. In contrast we suggest that the same controller can flexibly generate both kinds of sequences, without implying a different structure in the controller, only a different cost-function. We thus refine the expectation about neural correlates of sequence representations by showing that it potentially relates to the encoding of task constraints.

      To address this point, we added the following changes in the introduction and discussion:

      Line 69 in Introduction: 

      “The theory of optimal feedback control (OFC) has been particularly useful in predicting the influence of numerous task parameters on the controller (27–34), thus reproducing goal-directed motor commands during both unperturbed movements and feedback responses to disturbances (30). OFC has been used in numerous studies to interpret flexible feedback responses occurring in the long-latency response period (30, 35).” 

      Line 454 in Discussion:

      “Although OFC has been predominantly used as a behavioral level framework agnostic to neural activity patterns, it can shed light on the planning, state estimation and execution related computations in the transcortical feedback pathway (Takei et al.,). Using OFC, our study proposes a novel and precise definition of the difference to expect in neural activities in order to identify coarticulated versus independent sequence representations from a computational point of view. Because each condition (i.e., overlapping versus non-overlapping controllers as in Figure 2) was associated with different cost-functions and time-varying control gains, it is the process of deriving these control gains, using the internal representation of the task structure, that may differ across coarticulated and separated sequence conditions. To our knowledge, how and where this operation is performed is unknown. A corollary of this definition is that the preparatory activity (20, 50) may not discern independently planned or coarticulated sequences because these situations imply different control policies (and cost functions), as opposed to different initial states. Moreover, the nature of the sequence representation is potentially not dissociable from its execution for the same reason.”

      Reviewer #2 (Public Review):

      Summary: 

      In this manuscript, the authors examine the question of whether discrete action sequences and coarticulated continuous sequential actions can be produced from the same controller, without having to derive separate control policies for each sequential movement. Using modeling and behavioral experiments, the authors demonstrate that this is indeed possible if the constraints of the policy are appropriately specified. These results are of interest to those interested in motor sequences, but it is unclear whether these findings can be interpreted to apply to the control of sequences more broadly (see weaknesses below). 

      Strengths: 

      The authors provide an interesting and novel extension of the stochastic optimal control model to demonstrate how different temporal constraints can lead to either individual or coarticulated movements. The authors use this model to make predictions about patterns of behavior (e.g., in response to perturbations), which they then demonstrate in human participants both by measuring movement kinematics as well as EMG. Together this work supports the authors' primary claims regarding how changes in task instructions (i.e., task constraints) can result in coarticulated or separated movement sequences and the extent to which the subsequent movement goal affects the planning and control of the previous movement. 

      Weaknesses: 

      I reviewed a prior version of this manuscript, and appreciate the authors addressing many of my previous comments. However, there are some concerns, particularly with regard to how the authors interpret their findings. 

      We thank the reviewer for their continued assessment of our work and for helping us to improve the paper. We are convinced that this and the previous review helped us clarifying our work considerably.

      (1) It would be helpful for the authors to discuss whether they think there is a fundamental distinction between a coarticulated sequence and a single movement passing through a via point (or equivalently, avoiding an obstacle). The notion of a coarticulated sequence brings with it the notion of sequential (sub)movements and temporal structure, whereas the latter can be treated as more of a constraint on the production of a single continuous movement. If I am interpreting the authors' findings correctly it seems they are suggesting that these are not truly different kinds of movements at the level of a control policy, but it would be helpful for the authors to clarify this claim. 

      Indeed, this is our interpretation of the results/simulations. This suggestion can also be observed in Ramkumar et al., article on chunking. To clarify this, we added a statement in the discussion as follows: 

      Line 449: 

      “Notably, in the framework of optimal feedback control, an intermediate goal is equivalent to a via-point that constrains the execution of the sequence (similar to (13)). It is thus possible that coarticulation in motor systems be processed similarly as other kinds of movement constraints, such as via-points, avoiding obstacles, or changes in control policies.”

      (2) The authors' model clearly shows that each subsequent target only influences the movement of one target back, but not earlier ones (page 7 lines 199-204). This stands in contrast to the paper they cite from Kashefi 2023, in which those authors clearly show that people account for at least 2 targets in the future when planning/executing the current movement. It would be useful to know whether this distinction arises because of a difference in experimental methodology, or because the model is not capturing something about human behavior.  

      Thank you for raising this point. There are some differences between the study of Kashefi and colleagues (2023), and ours. Both studies looked into planning of more than one reach. In the study of Kashefi et al., the results of Figure 6 showed that in H2 condition, there was no significant curvature, and the curvature increases in H3 and H4 conditions (only in the 75ms dwell-time scenario). Note that H2 condition in their work meant the presentation of +2 target after the initiation of +1 reach. Hence, we think the GO task in our case should be compared to the H3 condition, resulting in similar curvature as in our study. These authors also showed that curvature increased even in the H4 condition (75 ms dwell). OFC also accommodates this observation, if we consider the relationship between the cost of intermediate goals and spatial location of the targets (see figure below, also added to Supplementary Figure 4). To see this, we performed additional 3 target simulations where the constraint on intermediate goal velocity (at T1 and T2) was varied to achieve similar dwell velocity at the intermediate targets (Supplementary Figure 4C). In this case, the hand curvature of the first reach differed while the dwell velocity was similar across T3 up and T3 down conditions, as may be instructed experimentally. Again, the task instructions and the spatial location of the future goals together determine how much the first reach components are influenced by the next ones, and this may impact several reaches ahead. 

      We added the following clarification in the result to describe this. 

      Line 199:

      “It is worth noting that the OFC model can be generalized to longer sequences (10) through the incorporation of additional cost terms (in Equation 10 of Methods) and targets, enabling simultaneous planning for more than two targets. Simulations of a sample three-reach sequence (Supplementary Figure S4) revealed that, varying the cost of dwell velocity at intermediate targets (w2 and w3 parameters in Methods) caused a variation in control gains. Different amount of change in control gains can be expected for intermediate versus late targets (Supplementary Figure 4A). Notably, even when we used the same dwell velocity cost (w2 = w3 = 0), the observed velocity profiles were different between the two sequences towards different final targets (T3 up and T3 down) (Supplementary Figure 4B). We tested a condition in which both sequence reaches were forced to have similar dwell velocity profiles by increasing the dwell velocity costs in the sequence towards one of the targets (T3 down), while leaving this parameter unchanged for the other target (T3 up). In this scenario, T3 up sequence had the parameters (w2, w3) = (0, 0), while T3 down sequence had the parameters (0.8, 0.8). In this case, the curvature of the first reach was different, and predominantly occurred due to differences in K2 between the two sequence reaches (Supplementary Figure S4C). These simulations highlight that, planning for a longer horizon sequence can indirectly influence the curvature of early reaches, due to the interaction between intermediate dwell constraints, spatial arrangement of targets, and sequence horizon in a task dependent manner.”

      (3) In my prior review I raised a concern that the authors seem to be claiming that because they can use a single control policy for both coarticulated and separated movement sequences, there need not be any higher-level or explicit specification of whether the movements are sequential. While much of that language has been removed, it still appears in a few places (e.g., p. 13, lines 403-404). As previously noted, the authors' control policy can generate both types of movements as long as the proper constraints are provided to the model. However, these constraints must be specified somewhere (potentially explicitly, as the authors do by providing them as task instructions). Moreover, in typical sequence tasks, although some movements become coarticulated, people also tend to form chunks with distinct chunk boundaries, which presumably means that there is at least some specification of the sequential ordering of these chunks that must exist (otherwise the authors' model might suggest that people can coarticulate forever without needing to exhibit any chunk boundaries). Hence the authors should limit themselves to the narrow claim that a single control policy can lead to separated or coarticulated movements given an appropriate set of constraints, but acknowledge that their work cannot speak to where or how those constraints are specified in humans (i.e., that there could still be an explicit sequence representation guiding coarticulation). 

      We thank the reviewer for raising this point. We do not dispute the statement that the controller needs to be set dependent on the constraints of the task that must be specified somewhere. In our view, this problem is similar to the question of how a cost-function (or a task representation) is transformed into a control policy in the brain, which is unknown in general. In the earlier version, our intention was to stress that separation can occur without necessarily implying that the goals be processed independently (as in Figure 1A and Zimnik 2021). To avoid confusion on this point, we modified this statement in the new version as follows:

      Line 405: 

      “A straightforward interpretation could be that the stopping at the first target invoked a completely different strategy in which the control of the two reaches was performed independently (Figure 1A), effectively separating the two movements, whereas executing them rapidly could produce the merging of the two sub-movements into a coarticulated sequence. While this is conceptually valid, it is not necessary and the model provides a more nuanced view: both apparent separation or coarticulation of the two motor patterns can be explained within the same framework of flexible feedback control. These different modes of sequence execution still require proper specification of the task constraints in the model, such as number of intermediate steps, dwell-time, or velocity limit. Such specifications must be considered as input to the controller.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Line 57: Distinct hypotheses. 

      Line 209, The term "planned holistically" is confusing here. Seems like the authors suggest that the sequence is "planned holistically" as long as all sequence elements are given during the optimization process. 

      We changed the sentence as follows.

      Line 218: 

      “Overall, the model predicted that even if a feedback control policy was computed by optimizing the whole sequence over a long time-horizon, the requirements associated with intermediate goals determine how early in the sequence the second (future) target can influence the feedback controller”

      Line 336, It was not clear to me why the authors explained "the weak significant" results of PEC shortening in R0 given the nonsignificant values in R1. 

      We wanted to be transparent about whether changing the statistical analysis will lead to different interpretations, such as the sequence encoding even before long latency epochs. But we realized that it could lead to confusion and we deleted this sentence in the updated manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      About Weakness #2, to clarify this point the authors should either model and discuss what it would take for their model to account for multiple targets ahead, or else run a study to show that in this task people indeed only ever plan 1 target ahead.  

      Please see our response above (in Weakness #2).

      I am still puzzled by why people would resist the perturbation more when they eventually have to move in the direction of the perturbation (e.g., p 10 lines 313-314). Perhaps this is simply due to the geometry of the task, but it could also depend on what participants were trying to accomplish in the experiment. To help clarify this, the authors should report exactly what instructions were given to participants in each task condition.  

      The simulations suggest that the observed perturbation movements are an optimal way to perform the task given the task constraints on accuracy, control effort and constraints at intermediate goals. The intuition is that modulating the acceleration at the intermediate goal is preferred rather than missing it. This however depends on the cost parameter. 

      Below, in Author response figure 1, we show the simulations by varying the accuracy requirements at intermediate goal and the total motor cost parameters. Clearly, as expected, increasing the cost on accuracy of the intermediate reach, or decreasing the cost on motor output modulated the hand deviation (simulations not included in the article).

      Author response image 1.

      Impact of movement costs (motor effort and intermediate goal reach errors) on the hand path following a mechanical perturbation   

      Our observation suggests that participants’ behaviour agreed with the interpretation that can result from the model. We clarified the exact instructions in the methods section. Note that the instructions were given at the beginning of the task and did not differ across the different conditions involving changes in the location of T2 or perturbation direction:

      Line 594:

      Participants were given the following instructions verbally: “Wait in the starting circle until you receive a GO signal, where the target circles turn red and you will simultaneously hear a beep sound. When the circles turn red, react quickly, move as soon, and as straight as possible to target 1 and then move to target 2. You will get two points at the end of the trial if you reach T1 in the prescribed time window and then move to T2, and in all other cases you will not receive any points. Importantly, once you reach T1 you should try to come out of it quickly. If you stay in T1 for more than 150 ms then T2 will disappear and you will receive only one point. Additionally, in some trials, a force will perturb your hand towards the right or left direction randomly while moving towards T1. The instructions remain the same in the presence of perturbations. Try to score as many points as you can.”

      Additionally, we added the following lines in the results description:

      Line 284:

      “The influence of second target on the lateral hand deviation was qualitatively similar to that observed in model simulations, and counterintuitive to what we might expect without the help of the model simulations. As observed in the model simulations (see also Supplementary Figure S2), lateral hand deviation was smaller when the perturbation was in the direction of the second target (T2) and vice-versa. This was consistent for both rightward and leftward perturbation conditions. Both the model and humans expressed this strategy that can be seen as an emergent feature of efficient feedback control during production of movement sequences. Additionally, even though behavior was reproduced in simulations, changing the cost on control effort and/or accuracy of intermediate reaches could modulate the sequencedependent changes in curvature.”

      I am not sure if "the data and code for simulations can be provided by the corresponding author" satisfies the eLife/PLoS software guidelines (i.e., that it be deposited in a public repository).

      Thank you for pointing this out. This sentence was added by mistake.

      We modified this statement in the updated manuscript. 

      “The data and code from simulations and experiments is available in the public repository ‘figshare’ in the following link (https://figshare.com/s/865a8b77c264ef17a181).”

    1. Author response:

      eLife assessment

      Cav2 voltage-gated calcium channels play key roles in regulating synaptic strength and plasticity. In contrast to mammals, invertebrates like Drosophila encode a single Cav2 channel, raising questions on how diversity in Cav2 is achieved from a single gene. Here, the authors present convincing evidence that two alternatively spliced isoforms of the Cac gene (cacophony, also known as Dmca1A and nightblindA) enable diverse changes in Cav2 expression, localization, and function in synaptic transmission and plasticity. These valuable findings will be of interest to a variety of researchers.

      We suggest replacing “two alternatively spliced isoforms of the Cac gene” by “two alternatively spliced mutually exclusive exon pairs of the Cac gene”. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Bell et. al. describes an analysis of the effects of removing one of two mutually exclusive splice exons at two distinct sites in the Drosophila CaV2 calcium channel Cacophony (Cac). The authors perform imaging and electrophysiology, along with some behavioral analysis of larval locomotion, to determine whether these alternatively spliced variants have the potential to diversify Cac function in presynaptic output at larval neuromuscular junctions. The author provided valuable insights into how alternative splicing at two sites in the calcium channel alters its function.

      Strengths:

      The authors find that both of the second alternatively spliced exons (I-IIA and I-IIB) that are found in the intracellular loop between the 1st and 2nd set of transmembrane domains can support Cac function. However, loss of the I-IIB isoform (predicted to alter potential beta subunit interactions) results in 50% fewer channels at active zones and a decrease in neurotransmitter release and the ability to support presynaptic homeostatic potentiation. Overall, the study provides new insights into Cac diversity at two alternatively spliced sites within the protein, adding to our understanding of how regulation of presynaptic calcium channel function can be regulated by splicing.

      Weaknesses:

      The authors find that one splice isoform (IS4B) in the first S4 voltage sensor is essential for the protein's function in promoting neurotransmitter release, while the other isoform (IS4A) is dispensable. The authors conclude that IS4B is required to localize Cac channels to active zones. However, I find it more likely that IS4B is required for channel stability and leads to the protein being degraded, rather than any effect on active zone localization. More analysis would be required to establish that as the mechanism for the unique requirement for IS4B.

      We agree that we need to explain more clearly why IS4B is unlikely required for channel stability, but instead, likely has a unique function at the presynaptic active zone of fast synapses. We will address this by revising text and by providing additional data. If IS4B was required for evoked release because it supported channel protein stability, then the removal of IS4B should cause protein degradation throughout all sub-neuronal compartments and throughout the CNS, but this is not the case. First, upon removal of IS4B in adult motoneurons (which use cac channels at the presynapse and somatodendritically, Ryglewski et al., 2012) evoked release from axon terminals is abolished (as at the larval NMJ), but somatodendritic cac inward current is present. If IS4B was required for cac channel stability, somatodendritic current should also be abolished. We will add these data to the ms. Second, immunohistochemistry for tagged IS4B channels reveals that these are present not only at presynaptic active zones at the NMJ but also throughout the VNC motor neuropils. Excision of IS4B causes the absence of cac channels from the presynaptic active zones at the NMJ and throughout the VNC neuropils (and accordingly this is lethal). By contrast, tagged IS4A channels (with IS4B excised) are not found at the presynaptic terminals of fast synapses, but instead, in other distinct parts of the CNS. We will also provide data to show this. Together these data are in line with a unique requirement of IS4B at presynaptic active zones (not excluding additional functions of IS4B), whereas IS4A containing cac isoforms mediate different functions.

      We appreciate the additional reviewer suggestions to the authors that we will address point by point when revising the ms. 

      Reviewer #2 (Public Review):

      This study by Bell et al. focuses on understanding the roles of two alternatively spliced exons in the single Drosophila Cav2 gene cac. The authors generate a series of cac alleles in which one or the other mutually exclusive exons are deleted to determine the functional consequences at the neuromuscular junction. They find alternative splicing at one exon encoding part of the voltage sensor impacts the activation voltage as well as localization to the active zone. In contrast, splicing at the second exon pair does not impact Cav2 channel localization, but it appears to determine the abundance of the channel at active zones. Together, the authors propose that alternative splicing at the Cac locus enables diversity in Cav2 function generated through isoform diversity generated at the single Cav2 alpha subunit gene encoded in Drosophila.

      Overall this is an excellent, rigorously validated study that defines unanticipated functions for alternative splicing in Cav2 channels. The authors have generated an important toolkit of mutually exclusive Cac splice isoforms that will be of broad utility for the field, and show convincing evidence for distinct consequences of alternative splicing of this single Cav2 channel at synapses. Importantly, the authors use electrophysiology and quantitative live sptPALM imaging to determine the impacts of Cac alternative splicing on synaptic function. There are some outstanding questions regarding the mechanisms underlying the changes in Cac localization and function, and some additional suggestions are listed below for the authors to consider in strengthening this study. Nonetheless, this is a compelling investigation of alternative splicing in Cav2 channels that should be of interest to many researchers.

      We agree that some additional information on cac isoform localization (in particular for splicing at the IS4 site) will strengthen the manuscript. We will address this by providing additional data and revising text (see responses to reviewers 1 and 3). We are also grateful for the additional reviewer suggestions which we will address point by point when revising the ms.  

      Reviewer #3 (Public Review):

      Summary:

      Bell and colleagues studied how different splice isoforms of voltage-gated CaV2 calcium channels affect channel expression, localization, function, synaptic transmission, and locomotor behavior at the larval Drosophila neuromuscular junction. They reveal that one mutually exclusive exon located in the fourth transmembrane domain encoding the voltage sensor is essential for calcium channel expression, function, active zone localization, and synaptic transmission. Furthermore, a second mutually exclusive exon residing in an intracellular loop containing the binding sites for Caβ and G-protein βγ subunits promotes the expression and synaptic localization of around ~50% of CaV2 channels, thereby contributing to ~50% of synaptic transmission. This isoform enhances release probability, as evident from increased short-term depression, is vital for homeostatic potentiation of neurotransmitter release induced by glutamate receptor impairment, and promotes locomotion. The roles of the two other tested isoforms remain less clear.

      Strengths:

      The study is based on solid data that was obtained with a diverse set of approaches. Moreover, it generated valuable transgenic flies that will facilitate future research on the role of calcium channel splice isoforms in neural function.

      Weaknesses:

      (1) Based on the data shown in Figures 2A-C, and 2H, it is difficult to judge the localization of the cac isoforms. Could they analyze cac localization with regard to Brp localization (similar to Figure 3; the term "co-localization" should be avoided for confocal data), as well as cac and Brp fluorescence intensity in the different genotypes for the experiments shown in Figure 2 and 3 (Brp intensity appears lower in the dI-IIA example shown in Figure 3G)? Furthermore, heterozygous dIS4B imaging data (Figure 2C) should be quantified and compared to heterozygous cacsfGFP/+.

      We understand the reviewer’s comment and will do the following to convincingly demonstrate absence of cac from presynaptic active zones upon IS4B excision. First, we will show selective enlargements of IS4A and IS4B with Brp in presynaptic active zones to show distinct cac label in active zones following excision of IS4A but not following excision of IS4B. Second, we will provide Pearson’s co-localization coefficients of Brp with IS4B and with IS4A, respectively. Third, we will reduce the intensity of the green channels in figures 2C and 2H to the same levels as in 2A and B, and H control to allow a fair comparison of cac intensities following excision of IS4B versus excision of IS4A and control. We had increased intensity to show that following excision of IS4B, no distinct cac label is found in active zones, even at high exaggerated image brightness. However, we agree with the reviewer that the bright background hampers interpretation and thus will show the same intensity in all images that need to be compared.

      (2) They conclude that I-II splicing is not required for cac localization (p. 13). However, cac channel number is reduced in dI-IIB. Could the channels be mis-localized (e.g., in the soma/axon)? What is their definition of localization? Could cac be also mis-localized in dIS4B? Furthermore, the Western Blots indicate a prominent decrease in cac levels in dIS4B/+ and dI-IIB (Figure 1D). How do the decreased protein levels seen in both genotypes fit to a "localization" defect? Could decreased cac expression levels explain the phenotypes alone?

      We will precisely define channel localization, and we will explain why it is highly unlikely that the absence of IS4B channels as well as the lower number of I-IIA channels are simply a consequence of reduced expression, but instead of splice variant specific channel function and localization. For example, upon excision of IS4B no cac channels are found at the presynaptic active zones and these synapses are thus non-functional. The isoforms containing the mutually exclusive IS4A exon are expressed and mediate other functions (see also response to reviewer 1) but cannot substitute IS4B containing isoforms at the presynapse. In fact, our Western blots are in line with reduced cac expression if all isoforms that mediate evoked release are missing, again indicating that the presynapse specific cac isoforms cannot be replaced by other cac isoforms (see also below, response to (3)). Feedback mechanisms that regulate cac expression in the absence of presynapse specific cac isoforms are beyond the scope of this study.

      (3) Cac-IS4B is required for Cav2 expression, active zone localization, and synaptic transmission. Similarly, loss of cac-I-IIB reduces calcium channel expression and number. Hence, the major phenotype of the tested splice isoforms is the loss of/a reduction in Cav2 channel number. What is the physiological role of these isoforms? Is the idea that channel numbers can be regulated by splicing? Is there any data from other systems relating channel number regulation to splicing (vs. transcription or post-transcriptional regulation)?

      We will provide additional evidence that mutually exclusive splicing at the IS4 site results in cac channels that localize to the presynaptic active zone (IS4B) versus cac channels that localize to other brain parts and/or other subneuronal compartments (see response to reviewer 1).  In addition, we already show in figure 2J that IS4B is required for normal cac HVA current, and we can add data showing that IS4A is not essential for cac HVA current. Similarly, for I-II we find it unlikely that differential splicing regulates channel numbers, but rather splice variant specific functions in different brain parts and different sub-neuronal compartments. To substantiate this interpretation, we will add data from developing adult motoneurons showing that excision of I-IIA causes reduced activity induced calcium influx into dendrites (new data), but it does not reduce channel number at the larval NMJ (figure 4). In our opinion these data are not in line with the idea that splicing regulates cac expression levels, and this in turn, results in specific defects in distinct neuronal compartments. However, we agree that the lack of isoforms with specific functions results in altered overall cac expression levels as indicated by our Western data. If isoforms normally abundantly expressed throughout most neuropils are missing due to exon excision, we indeed find less cac protein in Westerns. By contrast, the lack of isoforms with little abundance has little effect on cac expression levels. This may be the results of unknown feedback mechanisms which are beyond the scope of this study.

      (4) Although not supported by statistics, and as appreciated by the authors (p. 14), there is a slight increase in PSC amplitude in dIS4A mutants (Figure 2). Similarly, PSC amplitudes appear slightly larger (Figure 3J), and cac fluorescence intensity is slightly higher (Figure 3H) in dI-IIA mutants. Furthermore, cac intensity and PSC amplitude distributions appear larger in dI-IIA mutants (Figures 3H, J), suggesting a correlation between cac levels and release. Can they exclude that IS4A and/or I-IIA negatively regulate release? I suggest increasing the sample size for Canton S to assess whether dIS4A mutant PSCs differ from controls (Figure 2E). Experiments at lower extracellular calcium may help reveal potential increases in PSC amplitude in the two genotypes (but are not required). A potential increase in PSC amplitude in either isoform would be very interesting because it would suggest that cac splicing could negatively regulate release.

      There are several possibilities to explain this, but as none of the effects are statistically significant, we prefer to not investigate this in depth. However, given that we cannot find IS4A at the presynaptic active zone, IS4A is unlikely to have a direct negative effect on release probability. Nonetheless, given that IS4A containing cac isoforms mediate functions in other neuronal compartments it may regulate release indirectly by affecting action potential shape. We will provide data in response to the more detailed suggestions to authors that will provide additional insight.

      (5) They provide compelling evidence that IS4A is required for the amplitude of somatic sustained HVA calcium currents. However, the evidence for effects on biophysical properties and activation voltage (p. 13) is less convincing. Is the phenotype confined to the sustained phase, or are other aspects of the current also affected (Figure 2J)? Could they also show the quantification of further parameters, such as CaV2 peak current density, charge density, as well as inactivation kinetics for the two genotypes? I also suggest plotting peak-normalized HVA current density and conductance (G/Gmax) as a function of Vm. Could a decrease in current density due to decreased channel expression be the only phenotype? How would changes in the sustained phase translate into altered synaptic transmission in response to AP stimulation?

      Most importantly, HVA current is mostly abolished upon excision of IS4B (not IS4A, we think the reviewer accidentally mixed up the genotype). This indicates that the cac isoforms that mediate evoked release encode HVA channels. However, the somatodendritic current shown in figure 2J that remains upon excision of IS4B is mediated by IS4A containing cac isoforms. Please note that these never localize to the presynaptic active zone, thus the small inactivating HVA that remains in figure 2J does normally not mediate evoked release. Therefore, the interpretation is that specifically HVA current encoded by IS4B cac isoforms is required for synaptic transmission. Reduced cac current density is not the cause for this phenotype because a specific current component is absent. 

      We agree with the reviewer that a deeper electrophysiological analysis of cac currents mediated by IS4B containing isoforms will be instructive. However, a precise analysis of activation and inactivation voltages and kinetics suffers form space clamp issues in recordings from the soma of such complex neurons (DLM motoneurons of the adult fly). Therefore, we will analyze the currents in a heterologous expression system and present these data to the scientific community as a separate study at a later time point.

      (6) Why was the STED data analysis confined to the same optical section, and not to max. intensity z-projections? How many and which optical sections were considered for each active zone? What were the criteria for choosing the optical sections? Was synapse orientation considered for the nearest neighbor Cac - Brp cluster distance analysis? How do the nearest-neighbor distances compare between "planar" and "side-view" Brp puncta?

      Max. z-projections would be imprecise because they can artificially suggest close proximity of label that is close in x and y but far away in z. Therefore, the analysis was executed in xy-direction of various planes of entire 3D image stacks. We considered active zones of different orientations (Fig. 4C, D). In fact, we searched the entire z-stacks until we found active zones of all orientations shown in figures 4C1-C6 within the same boutons. The same active zone orientations were analyzed for all exon-out mutants with cac localization in active zones. The distance between cac and brp did not change if viewed from the side.

      (7) Cac clusters localize to the Brp center (e.g., Liu et al., 2011). They conclude that Cav2 localization within Brp is not affected in the cac variants (p. 8). However, their analysis is not informative regarding a potential offset between the central cac cluster and the Brp "ring". Did they/could they analyze cac localization with regard to Brp ring center localization of planar synapses, as well as Brp-ring dimensions?

      In the top views (planar) we did not find any clear offset in cac orientation to brp between genotypes. This study focuses on cac splice isoform specific localization and function. Possible effects of different cac isoforms on Brp-ring dimensions or other aspects of scaffold structure are not central to our study, in particular given that Brp puncta are clearly present even if cac is absent from the synapse (Fig. 2H), indicating that cac is not instructive for the formation of the Brp scaffold.  

      (8) Given the accelerated PSC decay/ decreased half width in dI-IIA (Fig. 5Q), I recommend reporting PSC charge in Figure 3, and PPR charge in Figures 5A-D. The charge-based PPRs of dI-IIA mutants likely resemble WT more closely than the amplitude-based PPR. In addition, miniature PSC decay kinetics should be reported, as they may contribute to altered decay kinetics. How could faster cac inactivation kinetics in response to single AP stimulation result in a decreased PSC half-width? Is there any evidence for an effect of calcium current inactivation on PSC kinetics? On a similar note, is there any evidence that AP waveform changes accelerate PSC kinetics? PSC decay kinetics are mainly determined by GluR decay kinetics/desensitization. The arguments supporting the role of cac splice isoforms in PSC kinetics outlined in the discussion section are not convincing and should be revised.

      We agree that reporting charge in figure 3 will be informative and will do so. We also understand the reviewer’s concern attributing altered PSC kinetics to presynaptic cac channel properties. We will tone down our interpretation in the discussion and list possible alterations in presynaptic AP shape or Cav2 channel kinetics as alternative explanations (not conclusions). Moreover, we will quantify postsynaptic GluRIIA abundance to test whether altered PSC kinetics are caused by altered GluRIIA expression. In our opinion, the latter is more instructive than mini decay kinetic analysis because this depends strongly on the distance of the recording electrode to the actual site of transmission in these large muscle cells.

      (9) Paired-pulse ratios (PPRs): On how many sweeps are the PPRs based? In which sequence were the intervals applied? Are PPR values based on the average of the second over the first PSC amplitudes of all sweeps, or on the PPRs of each sweep and then averaged? The latter calculation may result in spurious facilitation, and thus to the large PPRs seen in dI-IIB mutants (Kim & Alger, 2001; doi: 10.1523/JNEUROSCI.21-24-09608.2001).

      We agree that the PP protocol and analyses have to be described more precisely in the methods, and we will do so. PPR values are based on the PPRs of each sweep and then averaged. We are aware of the study of Kim and Alger 2001, but it does not affect our data interpretation because all genotypes were analyzed identically, but only the I-IIB excision resulted in the large data spread shown in figure 5.

      (10) Could the dI-IIB phenotype be simply explained by a decrease in channel number/ release probability? To test this, I propose investigating PPRs and short-term dynamics during train stimulation at lower extracellular Ca2+ concentration in WT. The Ca2+ concentration could be titrated such that the first PSC amplitude is similar between WT and dI-IIB mutants. This experiment would test if the increased PPR/depression variability is a secondary consequence of a decrease in Ca2+ influx, or specific to the splice isoform.

      In fact, the interpretation that decreased PSC amplitude upon I-IIB excision is caused mainly by reduced channel number is precisely our interpretation (see discussion page 14, last paragraph to page 15, first paragraph). In addition, we are grateful for the reviewer’s suggestion to triturate the external calcium such that the first PSC amplitude matches the one in ΔI-IIB to test whether altered short term plasticity is solely a function of altered channel number or whether additional causes, such as altered channel properties, also play into this. We will conduct these experiments and include them in the revised manuscript.

      (11) How were the depression kinetics analyzed? How many trains were used for each cell, and how do the tau values depend on the first PSC amplitude? Time constants in the range of a few (5-10) milliseconds are not informative for train stimulations with a frequency of 1 or 10 Hz (the unit is missing in Figure 5H). Also, the data shown in Figures 5E-K suggest slower time constants than 5-10 ms. Together, are the data indeed consistent with the idea that dI-IIB does not only affect cac channel number, but also PPR/depression variability (p. 9)?

      For each animal, the amplitudes of each PSC were plotted over time and fitted with a single exponential. For depression at 1 and 10 Hz, we used one train per animal, and 5-6 animals per genotype (as reflected in the data points in Figs 5H and 5L). Given that the tau values are highly similar between control and excision of I-IIA, but ΔI-IIA tends to have larger single PSC amplitudes, differences in first PSC amplitude do not seem to skew the data (but see also response to comment 10 above). We thank the reviewer for pointing out that tau values in the range of ms are not informative at 1 and 10 Hz stimulations (Figs 5H and 5L). We mis-labeled (or did not label) the axes. The label should read seconds, not milliseconds. We apologize, and this will be corrected accordingly.

      In sum, pending the outcome of additional important control experiments for GluRIIA abundance (see response to comment 8) and trituration of control PSC amplitude for the first pulse of paired pulses in ΔI-IIB (see response to comment 10) we will either modify or further support that interpretation.

      (12) The GFP-tagged I-IIA and mEOS4b-tagged I-IIB cac puncta shown in Figure 6N appear larger than the Brp puncta. Endogenously tagged cac puncta are typically smaller than Brp puncta (Gratz et al., 2019). Also, the I-IIA and I-IIB fluorescence sometimes appear to be partially non-overlapping. First, I suggest adding panels that show all three channels merged. Second, could they analyze the area and area overlap of I-IIA and I-IIB with regard to each other and to Brp, and compare it to cac-GFP? Any speculation as to how the different tags could affect localization? Finally, I recommend moving the dI-IIA and dI-IIB localization data shown in Figure 6N to an earlier figure (Figure 1 or Figure 3).

      We will show panels with all three labels matched as suggested by the reviewer. For the size of the puncta: this could be different numbers and types of fluorophores on the different antibodies used and thus different point spread, chromatic aberration, different laser and detector intensities etc. We will re-analyze the data to test whether there are systematic differences in size. We do not want to speculate whether the different tags have any effect on localization precision because of the abovementioned reasons as well as artificial differences in localization precision that can be suggested by different antibodies. We prefer to not move the figure because we believe it is informative to show our finding that active zones usually contain both splice variants together with the finding that only one splice variant is required for PHP.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      (1) In the "Introduction" section, an important aspect that requires attention pertains to the discussion surrounding the heterodimerization of CXCR4 and CCR5. Notably, the manuscript overlooks a recent study (https://doi.org/10.1038/s41467-023-42082-z) elucidating the mechanism underlying the formation of functional dimers within these G protein-coupled receptors (GPCRs)…The inclusion of this study within the manuscript would significantly enrich the contextual framework of the work, offering readers a comprehensive understanding of the current knowledge surrounding the structural dynamics and functional implications of CXCR4 and CCR5 heterodimerization.

      We thank the reviewer for his/her recommendation to enrich the contextual framework of our study. The Nature Communications paper by Di Marino et al. was published after we sent the first version of our manuscript to eLife, and therefore was not included in the discussion. As the reviewer rightly indicates, this paper elucidates the mechanism underlying the formation of functional dimers within CCR5 and CXCR4. Using metadynamics approaches, the authors emphasize the importance of distinct transmembrane regions for dimerization of the two receptors. In particular, CXCR4 shows two low energy dimer structures and the TMVI-TMVII helices are the preferred interfaces involved in the protomer interactions in both cases. Although the study uses in silico techniques, it also includes the molecular binding mechanism of CCR5 and CXCR4 in the membrane environment, as the authors generate a model in which the receptors are immersed in a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) phospholipid bilayer with 10% cholesterol. This is an important point in this study, as membrane lipids also interact with membrane proteins, and the lipid composition affects CXCR4 oligomerization (Gardeta S.R. et al. Front. Immunol. 2023). In particular, Di Marino et al. find a cholesterol molecule placed in-between the two CXCR4 protomers where it engages a series of hydrophobic interactions with residues including Leu132, Val214, Leu216 and Phe249. Then, the polar head of cholesterol forms an H-bond with Tyr135 that further stabilizes protomer binding. In our hands, the F249L mutation in CXCR4 reverted the antagonism of AGR1.137, suggesting that the compound binds, among others, this residue. We should, nonetheless, indicate that we analyzed receptor oligomerization and not CXCR4 dimerization, which was the main object of the Di Marino et al. study. It is therefore also plausible that other residues than those described as essential for CXCR4 dimerization might participate in receptor oligomerization. We can speculate that AGR1.137 might affect cholesterol binding to CXCR4 and, therefore, alter dimerization/oligomerization. Additionally, the CXCR4 x-ray structure with PDB code 3ODU (Wu B. et al. Science, 2010) experimentally shows the presence of two fatty acid molecules in contact with both TMV and TMVI. These molecules closely interact with hydrophobic residues in the protein, thereby stabilizing it in a hydrophobic environment. Although more experiments will be needed to clarify the mechanism involved, our results suggest that cholesterol and/or other lipids also play an important role in CXCR4 oligomerization and function, as seen for other GPCRs (Jakubik J. & ElFakahani E.E. Int J Mol Sci. 2021). However, we should also consider that other factors not included in the analysis by Di Marino et al. can also affect CXCR4 oligomerization; for instance, the co-expression of other chemokine receptors and/or other GPCRs that heterodimerize with CXCR4 might affect CXCR4 dynamics at the cell membrane, similar to other membrane proteins such as CD4, which also forms complexes with CXCR4 (Martinez-Muñoz L. et al. Mol. Cell 2018).

      The revised discussion contains references to the study by Di Marino et al. to enrich the contextual framework of our data.

      (2) In "various sections" of the manuscript, there appears to be confusion surrounding the terminology used to refer to antagonists. It is recommended to provide a clearer distinction between allosteric and orthosteric antagonists to enhance reader comprehension. An orthosteric antagonist typically binds to the same site as the endogenous ligand, directly blocking its interaction with the receptor. On the other hand, an allosteric antagonist binds to a site distinct from the orthosteric site, inducing a conformational change in the receptor that inhibits the binding of the endogenous ligand. By explicitly defining the terms "allosteric antagonist" and "orthosteric antagonist" within the manuscript, readers will be better equipped to discern the specific mechanisms discussed in the context of the study.

      The behavior of the compounds described in our manuscript (AGR1.35 and AGR1.137) fits with the definition of allosteric antagonists, as they bind on a site distinct from the orthosteric site, although they only block some ligand-mediated functions and not others. This would mean that they are not formally antagonists and should be not considered as allosteric compounds, as their binding on CXCR4 does not alter CXCL12 binding, although they might affect its affinity. In this sense, our compounds respond much better to the concept of negative allosteric modulators (Gao Z.-G. & Jacobson K.A. Drug Discov. Today Technol. 2013). They act by binding on a site distinct from the orthosteric site and selectively block some downstream signaling pathways but not others induced by the same endogenous agonist.

      To avoid confusion and to clarify the role of the compounds described in this study, we now refer to them as negative allosteric modulators along the manuscript.

      (3) In the Results section, the computational approach employed for "screening small compounds targeting CXCR4, particularly focusing on the inhibition of CXCL12-induced CXCR4 nanoclustering", requires clarification due to several points of incomprehension. The following recommendations aim to address these concerns and enhance the overall clarity of the section:

      (1) Computational Approach and Binding Mode Description: 

      -Explicitly describe the methodology for identifying the pocket/clef area in angstroms (Å) on the CXCR4 protein structure. Include details on how the volume of the cleft enclosed by TMV and TMVI was determined, as this information is not readily apparent in the provided reference (https://doi.org/10.1073/pnas.1601278113).

      The identification of the cleft was based on the observations by Wu et al. (Wu B. et al. Science 2010) who described the presence of bound lipids in the area formed by TMV and VI, and those of Wescott et al. (Wescott M.P. et al. Proc. Natl. Acad. Sci. 2016) on the importance of TMVI in the transmission of conformational changes promoted by CXCL12 on CXCR4 towards the cytoplasmic surface of the receptor to link the binding site with signaling activation. Collectively, these results, and our previous data on the critical role of the N-terminus region of TMVI for CXCR4 oligomerization (Martinez-Muñoz L. et al. Mol. Cell 2018), focused our in silico screening to this region. Once we detected that several compounds bound CXCR4 in this region, the cleavage properties were calculated by subtracting the compound structure. The resulting PDB was analyzed using the PDBsum server (Laskowski R.A. et. al. Protein Sci. 2018). Volume calculations were obtained using the server analyzing surface clefts by SURFNET (Laskowski R. A. J. Mol. Graph. 1995). The theoretical interaction surface between the selected compounds and CXCR4 and the atomic distances between the protein residues and the compounds was calculated using the PISA server (Krissinel E. & Henrick K. J. Mol. Biol. 2007) (Fig. I, only for review purposes). The analysis of the cleft occupied by AGR1.135 showed two independent cavities of 434 Å3 and 1,381 Å3 that were not connected to the orthosteric site. In the case of AGR1.137, the data revealed two distinct clefts of 790 Å3 and 580 Å3 (Fig. I, only for review purposes). These details have been included in the revised manuscript (New Fig. 1A, Supplementary Fig 8A, B).

      (4) Clarify the statement regarding the cleft being "surface exposed for interactions with the plasma membrane," particularly in the context of its embedding within the membrane.

      For GPCRs, transmembrane domains represent binding sites for bioactive lipids that play important functional and physiological roles (Huwiler A. & Zangemeister-Wittke U. Pharmacol. Ther. 2018). The channel between TMV and TMVI connects the orthosteric chemokine binding pocket to the lipid bilayer and is occupied by an oleic acid molecule, according to the CXCR4 structure published in 2010 (Wu B. et al. Science 2010). In addition, the target region contains residues involved in cholesterol (and perhaps other lipids) engagement (Di Marino et al. Nat. Commun. 2023). Taken together, these data support our statement that the cleft supports interactions between CXCR4 molecules and the plasma membrane. 

      Moreover, the data of Di Marino et al. also support that CCR5 and CXCR4 have a symmetric and an asymmetric binding mode. Therefore, either dimeric structure has the possibility to form trimers, tetramers, and even oligomers by using the free binding interface to complex with another protomer. This hypothesis suggests that the interaction of dimers to form oligomers should involve residues distinct from those included in the dimeric conformation.

      The sentence has been modified in the revised manuscript to clarify comprehension.

      (5) Discuss the rationale behind targeting the allosteric binding pocket instead of the orthosteric pocket, outlining potential advantages and disadvantages.

      The advantages and disadvantages of using negative allosteric modulators vs orthosteric antagonists have been now included in the revised discussion. 

      The majority of GPCR-targeted drugs function by binding to the orthosteric site of the receptor, and are agonists, partial agonists, antagonists or inverse agonists. These orthosteric compounds can have off-target effects and poor selectivity due to highly homologous receptor orthosteric sites and to abrogation of spatial and/or temporal endogenous signaling patterns. 

      The alternative is to use allosteric modulators, which can tune the functions associated with the receptors without affecting the orthosteric site. They can be positive, negative or neutral modulators, depending on their effect on the functionality of the receptor (Foster D.J. & Conn P.J. Neuron 2017). For example, the use of a negative allosteric modulator of a chemokine receptor to dampen pathological signaling events, while retaining full signaling for non-pathological activities might limit adverse effects (Kohout T.A.et al. J. Biol. Chem. 2004). In this case, the negative allosteric modulator 873140 blocks CCL3 binding on CCR5 but does not alter CCL5 binding (Watson C. et al. Mol. Pharmacol. 2005). In other cases, allosteric modulators can stabilize a particular receptor conformation and block others. The mechanism of action of the anti-HIV-1, FDAapproved, CCR5 allosteric modulator, maraviroc (Jin J. et al. Sci. Signal. 2018) is attributed to its ability to modulate CCR5 dimer populations and their subsequent subcellular trafficking and localization to the cell membrane (Jin J .et al. Sci. Signal. 2018). Two CCR5 dimeric conformations that are imperative for membrane localization were present in the absence of maraviroc; however, an additional CCR5 dimer conformation was discovered after the addition of maraviroc, and all homodimeric conformations were further stabilized. This finding is consistent with the observation that CCR5 dimers and oligomers inhibit HIV host-cell entry, likely by preventing the HIV-1 co-receptor formation.

      It is well known that GPCRs activate G proteins, but they also recruit additional proteins (e.g., β-arrestins) that induce signaling cascades which, in turn, can direct specific subsets of cellular responses independent of G protein activation (Eichel K. et al. Nature 2018) and are responsible for either therapeutic or adverse effects. Allosteric modulators can thus be used to block these adverse effects without influencing the therapeutic benefits. This was the case in the design of G protein-biased agonists for the kappa opioid receptor, which maintain the desirable antinociceptive and antipruritic effects and eliminate the sedative and dissociative effects in rodent models (Brust T.F. et al. Sci. Signal 2016).

      (6) Provide the PDB ID of the CXCR4 structure used as a template for modeling with SwissModel. Explain the decision to model the structure from the amino acid sequence and suggest an alternative approach, such as utilizing AlphaFold structures and performing classical molecular dynamics with subsequent clustering for the best representative structure.

      The PDB used as a template for modeling CXCR4 was 3ODU. This information was already included in the material and methods section. At the time we performed these analyses, there were several crystallographic structures of CXCR4 in complex with different molecules and peptides deposited at the PDB. None of them included a full construct containing the complete receptor sequence to provide a suitable sample for Xray structure resolution, as the N- and C-terminal ends of CXCR4 are very flexible loops. In addition, the CXCR4 constructs contained T4 lysozyme inserted between helices TMV and TMVI to increase the stability of the protein––a common strategy used to facilitate crystallogenesis of GPCRs (Zou Y. et al. PLoS One 2012). Therefore, we generated a CXCR4 homology model using the SWISS-MODEL server (Waterhouse A. et al. Nucleic Acids Res. 2018). This program reconstructed the loop between TMV and TMVI, a domain particularly important in this study that was not present in any of the crystal structure available in PDB. The model structure was, nonetheless, still incomplete, as it began at P27 and ended at S319 because the terminal ends were not resolved in the crystal structure used as a template. Nevertheless, we considered that these terminal ends were not involved in CXCR4 oligomerization. 

      As Alphafold was not available at the time we initiated this project, we didn’t use it. However, we have now updated our workflow to current methods and predicted the structure of the target using AlphaFold (Jumper J. et al. Nature 2021) and the sequence available under UniProt entry P61073. We prepared the ligands using OpenBabel (O’Boyle N.M. et al., J. Cheminformatics 2011), with a gasteiger charge assignment, and generated 10 conformers for each input ligand using the OpenBabel genetic algorithm. We then prepared the target structure with Openmm, removing all waters and possible heteroatoms, and adding all missing atoms. We next predicted the target binding pockets with fPocket (Le Guilloux V. et al. BMC Bioinformatics 2009), p2rank (Krivak R. & Hoksza, J. Cheminformatics 2018), and AutoDock autosite (Ravindranath P.A. & Sanner M.F. Bioinformatics 2016). We chose only those pockets between TMV and TMVI (see answer to point 3). We merged the results of the three programs into so-called consensus pockets, as two pockets are said to be sufficiently similar if at least 75% of their surfaces are shared (del Hoyo D. et al. J. Chem. Inform. Model. 2023). From the consensus pockets, there was one pocket that was significantly larger than the others and was therefore selected. We then docked the ligand conformers in this pocket using AutoDock GPU (Santos-Martins D. et al. J. Chem. Theory Comput. 2021), LeDock (Liu N & Xu Z., IOP Conf. Ser. Earth Environ. Sci. 2019), and Vina (Eberhardt J. et al. J. Chem. Inf. Model. 2021). The number of dockings varied from 210 to 287 poses. We scored each pose with the Vina score using ODDT (Wójcikowski M. et al. J. Cheminform. 2015). Then, we clustered the different solutions into groups whose maximum RMSD was 1Å. This resulted in 40 clusters, the representative of each cluster was the one with maximum Vina score and confirmed that the selected compounds bound this pocket (Author response image 1). When required, we calculated the binding affinity using Schrodinger’s MM-GBSA procedure (Greenidge P.A. et al. J. Chem. Inf. Model. 2013), in two ways: first, assuming that the ligand and target are fixed; second, with an energy minimization of all the atoms within a distance of 3Å from the ligand. This information has now been included in the revised version of the manuscript.

      Author response image 1.

      AGR1.135 docking in CXCR4 using the updated protocol for ligand docking. Cartoon representation colored in gray with TMV and TMVI shown in blue and pink, respectively. AGR1.135 is shown in stick representation with carbons in yellow, oxygens in red and nitrogens in blue.

      (7) Specify the meaning of "minimal interaction energy" and where (if present) the interaction scores are reported in the text.

      We refer to minimal interaction energy, the best docking score, that is, the best score obtained in our docking studies. These data were not included in the previous manuscript due to space restrictions but are now included in the reviewed manuscript.

      (8) You performed docking studies using GLIDE to identify potential binding sites for the small compounds on the CXCR4 protein. The top-scoring binders were then subjected to further refinement using PELE simulations. However, I realize that a detailed description of the specific binding modes of these compounds was not provided in the text. Please make the description of binding poses more detailed

      Firstly, to assess the reliability of this method, a PELE study was carried out for the control molecule IT1t, which is a small drug-like isothiourea derivative that has been crystallized in complex with CXCR4 (PDB code: 3ODU). IT1t is a CXCR4 antagonist that binds to the CXCL12 binding cavity and inhibits HIV-1 infection (Das D. Antimicrob. Agents Chemother. 2015; Dekkers S. et al. J. Med. Chem. 2023). From the best five trajectories, two of them had clearly better binding energies, and corresponded to almost the same predicted pose of the molecule. Although the predicted binding mode was not exactly the same as the one in the crystal structure, the approximation was very good, giving validation to the approach. Although PELE is a suitable technique to find potential binding sites, the predicted poses must be subsequently refined using docking programs.

      Analyzing the best trajectories for the remaining ligands, at least one of the best-scored poses was always located at the orthosteric binding site of CXCR4. Even though these poses showed good binding energies, they were discarded as the in vitro biological experiments indicated that the compounds were unable to block CXCL12 binding or CXCL12-mediated inhibition of cAMP release or CXCR4 internalization. Collectively, these data indicated that the selected compounds did not behave as orthosteric inhibitors of CXCR4. The CXCL12 binding pocket is the biggest cavity in CXCR4, and so PELE may tend to place the molecules near it. However, all the compounds presented other feasible binding sites with a comparable binding energy.

      AGR1.135 and AGR1.137 showed interesting poses between TMV and TMVI with very good binding energy (-51.4 and -37.2 kcal/mol, respectively). This was precisely the region we had previously selected for the in silico screening, as previously described (see response to point 3).

      AGR1.131 showed two poses with low binding energy that were placed between helices TMI and TMVII (-43.6 kcal/mol) and between helices TMV and TMVI (-39.8 kcal/mol). This compound was unable to affect CXCL12-mediated chemotaxis and was therefore used as an internal negative control as it was selected in the in silico screening with the same criteria as the other compounds but failed to alter any CXCL12-mediated functions. PELE studies nonetheless provided different binding sites for each molecule, which had to be further studied using docking to obtain a more accurate binding mode. In agreement with the previous commentary, we repeated the analysis using AlphaFold and the rest of the procedure described (see our response to point 6) and calculated the binding energies for all the compounds using Schrodinger’s MM-GBSA procedure (Greenidge P.A. et al. J. Chem. Inf. Model. 2013). Calculations were performed in two ways: first, assuming that the ligand and target are fixed; second, with an energy minimization of all the atoms within a distance of 3Å from the ligand. The results using the first method indicated that AGR1.135 and AGR1.137 showed poses between TMV and TMVI with - 56.4 and -62.4 kcal/mol, respectively and AGR1.131 had a pose between TMI and TMVII with -61.6kcal/mol.  In the second method AGR1.135 and AGR1.137 showed poses between TMV and TMVI with -57.9, and -67.6 kcal/mol, respectively, and AGR1.131 of -62.2 kcal/mol between TMI and TMVII.

      This information is now included in the text.

      (9) (2) Experimental Design:-Justify the choice of treating Jurkat cells with a concentration of 50 μM of the selected compound. Consider exploring different concentrations and provide a rationale for the selected dosage. Additionally, clearly identify the type of small compound used in the initial experiment.

      The revised version contains a new panel in Fig. 1B to show a more detailed kinetic analysis with different concentrations (1-100 µM) of the compounds in the Jurkat migration experiments. In all cases, 100 µM nearly completely abrogated cell migration, but in order to reduce the amount of DMSO added to the cells we selected 50 µM for further experiments, as it was the concentration that inhibits 50-75% of ligand-induced cell migration. Regarding the type of small compounds used in the initial experiments, they were compounds included in the library described in reference #24 (Sebastian-Pérez V. et al Med. Biol. Chem. 2017), which contains heterocyclic compounds. We would note that we do not consider AGR1.137 a final compound. We think that there is scope to develop AGR1.137-based second-generation compounds with greater solubility in water, greater specificity or affinity for CXCR4, and to evaluate delivery methods to hopefully increase activity.  

      (10) Avoid reporting details in rounded parentheses within the text; consider relocating such information to the Materials and Methods section or figure captions for improved readability.

      Most of the rounded parentheses within the text have been eliminated in the revised version of the manuscript to improve readability.

      (11) Elaborate on the virtual screening approach using GLIDE software, specifying the targeted site and methodology employed.

      For the virtual screening, we used the Glide module (SP and XP function scoring) included in the Schrödinger software package, utilizing the corresponding 3D target structure and our MBC library (Sebastián-Pérez V et al. J. Chem. Inf. Model. 2017).  The center of the catalytic pocket was selected as the centroid of the grid. In the grid generation, a scaling factor of 1.0 in van der Waals radius scaling and a partial charge cutoff of 0.25 were used. A rescoring of the SP poses of each compound was then performed with the XP scoring function of the Glide. The XP mode in Glide was used in the virtual screening, the ligand sampling was flexible, epik state penalties were added and an energy window of 2.5 kcal/mol was used for ring sampling. In the energy minimization step, the distance-dependent dielectric constant was 4.0 with a maximum number of minimization steps of 100,000. In the clustering, poses were considered as duplicates and discarded if both RMS deviation is less than 0.5 Å and maximum atomic displacement is less than 1.3 Å.

      (12) Provide clarity on the statement that AGR1.131 "theoretically" binds the same motif, explaining the docking procedure used for this determination.

      In the in silico screening, AGR1.131 was one of the 40 selected compounds that showed, according to the PELE analysis (see answer to point 8), a pose with low binding energy (-39.8 kcal/mol) between TMV and TMVI helices, which is the selected area for the screening. It, nonetheless, also showed a best pose placed between helices TM1 and TM7 (-43.7 kcal/mol) using the initial workflow. In conclusion, although AGR1.131 also faced to the TMV-TMVI, the most favorable pose was in the area between TMI and TMVII. In addition, the compound was included in the biological screening, where it did not affect CXCL12-mediated chemotaxis. We thus decided to use it as an internal negative control, as it has a skeleton very similar to AGR1.135 and AGR1.137 and can interact with the TM domains of CXCR4 without promoting biological effects. This statement has been clarified in the revised text.

      (13) Toxicity Testing:

      -Enhance the explanation of the approach to testing the toxicity of the compound in Jurkat cells. Consider incorporating positive controls to strengthen the assessment and clarify the experimental design.

      All the selected compounds in the in silico screening were initially tested for propidium iodide incorporation in treated cells in a toxicity assay, and some of them were discarded for further experiments (e.g., AGR1.103 and VSP3.1).

      Further evaluation of Jurkat cell viability was determined by cell cycle analysis using propidium iodide.  Supplementary Fig. 1B included the percentage of each cell cycle phase, and data indicated no significant differences between the treatments tested. Nevertheless, at the suggestion of the reviewer, and to clarify this issue, positive controls inducing Jurkat cell death (staurosporine and hydrogen peroxide) have also been included in the new Supplementary Fig. 2. The new figure also includes a table showing the percentage of cells in each cell-cycle phase.  

      (14) In the Results section concerning "AGR1.135 and AGR1.137 blocking CXCL12-mediated CXCR4 nanoclustering and dynamics", several points can be improved to enhance clarity and coherence: 1. Specificity of Low Molecular Weight Compounds:  

      -Clearly articulate how AGR1.135 and AGR1.137 specifically target homodimeric CXCR4 and provide an explanation for their lack of impact on heterodimeric CXCR4-CCR5 in that region.

      First of all, we should clarify that when we talk about receptor nanoclustering, oligomers refer to complexes including 3 or more receptors and, therefore, the residues involved in these interactions can differ from those involved in receptor dimerization. Moreover, our FRET experiments did not indicate that the compounds alter receptor dimerization (see new Supplementary Fig. 7). Of note, mutant receptors unable to oligomerize can still form dimers (Martínez-Muñoz L. et al. Mol. Cell 2018; García-Cuesta E.M .et al. Proc. Natl. Acad. Sci. USA 2022). Additionally, we believe that these oligomers can also include other chemokine receptors/proteins expressed at the cell membrane, which we are currently studying using different models and techniques.

      We have results supporting the existence of CCR5/CXCR4 heterodimers (Martínez-Muñoz L et al. Proc. Natl. Acad. Sci. USA 2014), in line with the data published by Di Marino et al. However, in the current study we have not evaluated the impact of the selected compounds on other CXCR4 complexes distinct from CXCR4 oligomers. Our Jurkat cells do not express CCR5 and, therefore, we cannot discuss whether AGR1.137 affects CCR5/CXCR4 heterodimers. The chemokine field is very complex and most receptors can form dimers (homo- and heterodimers) as well as oligomers (Martinez-Muñoz L., et al Pharmacol & Therap. 2011) when co-expressed. To evaluate different receptor combinations in the same experiment is a complex task, as the number of potential combinations between distinct expressed receptors makes the analysis very difficult. We started with CXCR4 as a model, to continue later with other possible CXCR4 complexes. In addition, for the analysis of CCR5/CXCR4 dynamics, it is much better to use dual-TIRF techniques, which allow the simultaneous detection of two distinct molecules coupled to different fluorochromes.

      Regarding the data of Di Marino et al., it is possible that the compounds might also affect heterodimeric conformations of CXCR4. This aspect has also been broached in the revised discussion. We would again note that we evaluated CXCR4 oligomers and not monomers or dimers; this is especially relevant when we compare the residues involved in these processes as they might differ depending on the receptor conformation considered. This issue was also hypothesized by Di Marino et al. (see our response to point 4).

      (15) When referring to "unstimulated" cells, provide a more detailed explanation to elucidate the experimental conditions and cellular state under consideration.

      Unstimulated cells refer to the cells in basal conditions, that is, cells in the absence of CXCL12. For TIRF-M experiments, transiently-transfected Jurkat cells were plated on glass-bottomed microwell dishes coated with fibronectin; these are the unstimulated cells. To observe the effect of the ligand, dishes were coated as above plus CXCL12 (stimulated cells). We have clarified this point in the material and methods section of the revised version.

      (16) 2. Paragraph Organization

      -Reorganize the second paragraph to eliminate redundancy and improve overall flow. A more concise and fluid presentation will facilitate reader comprehension and engagement.

      The second paragraph has been reorganized to improve overall flow.

      (17) Ensure that each paragraph contributes distinct information, avoiding repetition and redundancy.

      We have carefully revised each paragraph of the manuscript to avoid redundancy.

      (18) 3. Claim of Allosteric Antagonism:

      -Exercise caution when asserting that "AGR1.135 and AGR1.137 behave as allosteric antagonists of CXCR4" based on the presented results. Consider rephrasing to reflect that the observed effects suggest the potential allosteric nature of these compounds, acknowledging the need for further investigations and evidence.

      To avoid misinterpretations on the effect of the compounds on CXCR4, as we have commented in our response to point 2, we have substituted the term allosteric inhibitors with negative allosteric modulators, which refer to molecules that act by binding a site distinct from the orthosteric site, and selectively block some downstream signaling pathways, whereas others induced by the same endogenous or orthosteric agonist are unaffected (Gao Z.-G. & Jacobson K.A. Drug Discov. Today Technol. 2013). Our data indicate that the selected small compounds do not block ligand binding or G protein activation or receptor internalization, but inhibit receptor oligomerization and ligand-mediated directed cell migration.

      (19) In the Results section discussing the "incomplete abolition of CXCR4-mediated responses in Jurkat cells by AGR1.135 and AGR1.137", several points can be refined for better clarity and completeness:  1. Inclusion of Positive Controls: 

      -Consider incorporating positive controls in relevant experiments to provide a comparative benchmark for assessing the impact of AGR1.135 and AGR1.137. This addition will strengthen the interpretation of results and enhance the experimental rigor. 

      The in vivo experiments (Fig. 7E,F) used AMD3100, an orthosteric antagonist of CXCR4, as a positive control. We also included AMD3100, as a positive control of inhibition when evaluating the effect of the compounds on CXCL12 binding (Fig. 3, new Supplementary Fig. 3). The revised version of the manuscript also includes the effect of this inhibitor on other relevant CXCL12-mediated responses such as cell migration (Fig. 1B), receptor internalization (Fig. 3A), cAMP production (Fig. 3C), ERK1/2 and AKT phosphorylation (Supplementary Fig. 4), actin polymerization (Fig. 4A), cell polarization (Fig. 4B, C) and cell adhesion (Fig. 4D), to facilitate the interpretation of the results and improve the experimental rigor.

      (20) 2. Clarification of Terminology: 

      -Clarify the term "CXCR4 internalizes" by providing context, perhaps explaining the process of receptor internalization and its relevance to the study.

      We refer to CXCR4 internalization as a CXCL12-mediated endocytosis process that results in reduction of CXCR4 levels on the cell surface. We use CXCR4 internalization in this study with two purposes: First, for CXCR4 and other chemokine receptors, internalization processes are mediated by ligand-induced clathrin vesicles (Venkatesan et al 2003) a process that triggers CXCR4 aggregation in these vesicles. We have previously determined that the oligomers of receptors detected by TIRF-M remain unaltered in cells treated with inhibitors of clathrin vesicle formation and of internalization processes (Martinez-Muñoz L. et al. Mol. Cell 2018). Moreover, we have described a mutant CXCR4 that cannot form oligomers but internalizes normally in response to CXCL12 (Martinez-Muñoz L. et al. Mol. Cell 2018). The observation in this manuscript of normal CXCL12-mediated endocytosis in the presence of the negative allosteric inhibitors of CXCR4 that abrogate receptor oligomerization reinforces the idea that the oligomers detected by TIRF are not related to receptor aggregates involved in endocytosis; Second, receptor internalization is not affected by the allosteric compounds, indicating that they downregulate some CXCL12-mediated signaling events but not others (new Fig. 3).

      All these data have been included in the revised discussion of the manuscript.

      (21) Elaborate on the meaning of "CXCL12 triggers normal CXCR4mut internalization" to enhance reader understanding.

      We have previously described a triple-mutant CXCR4 (K239L/V242A/L246A; CXCR4mut). The mutant residues are located in the N-terminal region of TMVI, close to the cytoplasmic region, thus limiting the CXCR4 pocket described in this study (see our response to point 3). This mutant receptor dimerizes but neither oligomerizes in response to CXCL12 nor supports CXCL12-induced directed cell migration, although it can still trigger some Ca2+ flux and is internalized after ligand activation (Martinez-Muñoz L. et al. Mol. Cell 2018).  We use the behavior of this mutant (CXCR4mut) to show that the CXCR4 oligomers and the complexes involved in internalization processes are not the same and to explain why we evaluated CXCR4 endocytosis in the presence of the negative allosteric modulators.

      As we indicated in a previous answer to the reviewer, these issues have been re-elaborated in the revised version.

      (22) 3. Discrepancy in CXCL12 Concentration:

      -Address the apparent discrepancy between the text stating, "...were stimulated with CXCL12 (50 nM, 37{degree sign}C)," and the figure caption (Fig. 3A) reporting a concentration of 12.5 nM. Rectify this inconsistency and provide an accurate and clear explanation.

      We apologize for this error, which is now corrected in the revised manuscript. With the exception of the cell migration assays in Transwells, where the optimal concentration was established at 12.5 nM, in the remaining experiments the optimal concentration of CXCL12 employed was 50 nM. These concentrations were optimized in previous works of our laboratory using the same type of experiment. We should also remark that in the experiments using lipid bilayers or TIRF-M experiments, CXCL12 is used to coat the plates and therefore it is difficult to determine the real concentration of the ligand that is retained in the surface of the plates after the washing steps performed prior to adding the cells. In addition, we use 100 nM CXCL12 to create the gradient in the chambers used to perform the directed-cell migration experiments.

      (23) 4. Speculation on CXCL12 Binding:

      -Refrain from making speculative statements, such as "These data suggest that none of the antagonists alters CXCL12 binding to CXCR4," unless there is concrete evidence presented up to that point. Clearly outline the results that support this conclusion.

      Figure 3B and Supplementary Figure 3 show CXCL12-ATTO700 binding by flow cytometry in cells pretreated with the negative allosteric modulators. We have also included AMD3100, the orthosteric antagonist, as a control for inhibition. While these experiments showed no major effect of the compounds on CXCL12 binding, we cannot discard small changes in the affinity of the interaction between CXCL12 and CXCR4. In consequence we have re-written these statements.

      (24) 5. Corroboration of Data:

      -Specify where the corroborating data from immunostaining and confocal analysis are reported, ensuring readers can access the relevant information to support the conclusions drawn in this section.

      In agreement with the suggestion of the reviewer, the revised manuscript includes data from immunostaining and confocal analysis to complement Fig. 4B (new Fig. 4C). The revised version also includes some representative videos for the TIRF experiments showed in Figure 2 to clarify readability.

      (25) In the Results section concerning "AGR1.135 and AGR1.137 antagonists and their direct binding to CXCR4", several aspects need clarification and refinement for a more comprehensive and understandable presentation: 1. Workflow Clarification:

      -Clearly articulate the workflow used for assessing the binding of AGR1.135 and AGR1.137 to CXCR4. Address the apparent contradiction between the inability to detect a direct interaction and the utilization of Glide for docking in the TMV-TMVI cleft.

      To address the direct interaction of the compounds with CXCR4, we intentionally avoided the modification of the small compounds with different labels, which could affect their properties. We therefore attempted a fluorescence a spectroscopy strategy to formally prove the ability of the small compounds to bind CXCR4, but this failed because the AGR1.135 is yellow in color, which interfered with the determinations. We also tried a FRET strategy (see new Supplementary Fig. 7) and detected a significant increase in FRET efficiency of CXCR4 homodimers when AGR1.135 was evaluated, but again the yellow color interfered with FRET determinations. Moreover, AGR1.137 did not modify FRET efficiency of CXCR4 dimers. Therefore, we were unable to detect the interaction of the compounds with CXCR4.

      We elected to develop an indirect strategy; in silico, we evaluated the binding-site using docking and molecular dynamics to predict the most promising CXCR4 binding residues involved in the interaction with the selected compounds. Next, we generated point mutant receptors of the predicted residues and re-evaluated the behavior of the allosteric antagonists in a CXCL12-induced cell migration experiment. Obviously, we first discarded those CXCR4 mutants that were not expressed on the cell membrane as well as those that were not functional when activated with CXCL12. Using this strategy, we eliminated the interference due to the physical properties of the compounds and demonstrated that if the antagonism of a compound is reversed in a particular CXCR4 mutant it is because the mutated residue participates or interferes with the interaction between CXCR4 and the compound, thus assuming (albeit indirectly) that the compound binds CXCR4. 

      To select the specific mutations included in the analysis, our strategy was to generate point mutations in residues present in the TMV-TMVI pocket of CXCR4 that were not directly proposed as critical residues involved in chemokine engagement, signal initiation, signal propagation, or G protein-binding, based on the extensive mutational study published by Wescott MP et. al. (Wescott M.P. et. al. Proc. Natl. Acad. Sci. U S A. 2016).

      (26) Provide a cohesive explanation of the transition from docking evaluation to MD analysis, ensuring a transparent representation of the methodology.

      Based on the aim of this work, the workflow shown in Author response image 2, was proposed to predict the binding mode of the selected molecules. Firstly, a CXCR4 model was generated to reconstruct some unresolved parts of the protein structure; then a binding site search using PELE software was performed to identify the most promising binding sites; subsequently, docking studies were performed to refine the binding mode of the molecules; and finally, molecular dynamics simulations were run to determine the most stable poses and predict the residues that we should mutate to test that the compounds interact with CXCR4. 

      Author response image 2.

      Workflow followed to determine the binding mode of the  studied compounds.

      (27) 2. Choice of Software and Techniques:

      -Justify the use of "AMBER14" and the PELE approach, considering  their potential obsolescence.

      These experiments were performed five years ago when the project was initiated. As the reviewer indicates, AMBER14 and PELE approaches might perhaps be considered obsolescent. Thus, we have predicted the structure of the target using AlphaFold (Jumper J. et al, Nature 2021) and the sequence available under UniProt entry P61073. The complete analysis performed (see our response to point 4) confirmed that the compounds bound the selected pocket, as we had originally determined using PELE. These new analyses have been incorporated into the revised manuscript.

      (28)-Discuss the role of the membrane in the receptor-ligand interac7on. Elaborate on how the lipidic double layer may influence the binding of small compounds to GPCRs embedded in the membrane.

      Biological membranes are vital components of living organisms, providing a diffusion barrier that separates cells from the extracellular environment, and compartmentalizing specialized organelles within the cell. In order to maintain the diffusion barrier and to keep it electrochemically sealed, a close interaction of membrane proteins with the lipid bilayer is necessary. It is well known that this is important, as many membrane proteins undergo conformational changes that affect their transmembrane regions and that may regulate their activity, as seen with GPCRs (Daemen F.J. & Bonting S.L., Biophys. Struct. Mech. 1977; Gether U. et al. EMBO J. 1997). The lateral and rotational mobility of membrane lipids supports the sealing function while allowing for the structural rearrangement of membrane proteins, as they can adhere to the surface of integral membrane proteins and flexibly adjust to a changing microenvironment. In the case of the first atomistic structure of CXCR4 (Wu B. et al. Science 2010), it was indicated that for dimers, monomers interact only at the extracellular side of helices V and VI, leaving at least a 4-Å gap between the intracellular regions, which is presumably filled by lipids. In particular, they indicated that the channel between TMV and TMVI that connects the orthosteric chemokine binding pocket to the lipid bilayer is occupied by an oleic acid molecule. Recently, Di Marino et al., analyzing the dimeric structure of CXCR4, found a cholesterol molecule placed in between the two protomers, where it engages a series of hydrophobic interactions with residues located in the area between TMI and TMVI (Leu132, Val214, Leu216, Leu246, and Phe249). The polar head of cholesterol forms an H-bond with Tyr135 that further stabilizes its binding mode. This finding confirms that cholesterol might play an important role in mediating and stabilizing receptor dimerization, as seen in other GPCRs (Pluhackova, K., et al. PLoS Comput. Biol. 2016). In addition, we have previously observed that, independently of the structural changes on CXCR4 triggered by lipids, the local lipid environment also regulates CXCR4 organization, dynamics and function at the cell membrane and modulates chemokine-triggered directed cell migration. Prolonged treatment of T cells with bacterial sphingomyelinase promoted the complete and sustained breakdown of sphingomyelins and the accumulation of the corresponding ceramides, which altered both membrane fluidity and CXCR4 nanoclustering and dynamics. Under these conditions, CXCR4 retained some CXCL12-mediated signaling activity but failed to promote efficient directed cell migration (Gardeta S.R. et al. Front. Immunol. 2022). Collectively, these data demonstrate the key role that lipids play in the stabilization of CXCR4 conformations and in regulating its lateral mobility, influencing their associated functions. These considerations have been included in the revised version of the manuscript. 

      (29) 3. Stable Trajectories and Binding Mode Superimposi7on -Specify the criteria for defining "stable trajectories" to enhance reader understanding

      There could be several ways to describe the stability of a MD simulation, based on the convergence of energies, distances or ligand-target interactions, among others. In this work, we use the expression “stable trajectories” to refer to simulations in which the ligand trajectory converges and the ligand RMSD does not fluctuate more than 0.25Å. This definition is now included in the revised text.

      (30)  Clarify the meaning behind superimposing the two small compounds and ensure that the statement in the figure caption aligns with the information presented in the main text.

      We apologize for the error in the previous Fig. 5A and in its legend. The figure was created by superimposing the protein component of the poses for the two compounds, AGR1.135 and AGR1.137, rather than the compounds themselves. As panel 5A was confusing, we have modified all Fig. 5 in the revised manuscript to improve clarity.

      (31) 4. Volume Analysis and Distances:

      -Provide details on how the volume analysis was computed and how distances were accounted for. Consider adding a figure to illustrate these analyses, aiding reader comprehension.

      The cleft search and analysis were performed using the default settings of SURFNET (Laskowski R.A. J. Mol. Graph. 1995) included in the PDBsum server (Laskowski R.A. et. al. Trends Biochem. Sci. 1997). The first run of the input model for CXCR4 3ODU identified a promising cleft of 870 Å3 in the lower half of the region flanked by TMV and TMVI, highlighting this area as a possible small molecule binding site (Fig. I, only for review purposes). Analysis of the cleft occupied by AGR1.135 showed two independent cavities of 434 Å3 and 1381 Å3 that were not connected to the orthosteric site. The same procedure for AGR1.137 revealed two distinct clefts of 790 Å3 and 580 Å3, respectively (Fig. I, only for review purposes). Analysis of the atomic distances between the protein residues and the compounds was performed using the PISA server. Krissinel E. & Henrick K. J. Mol. Biol. 2007). (Please see our response to point 3 and the corresponding figure).

      (32) 5. Mutant Selection and Relevance:

      -Clarify the rationale behind selecting the CXCR4 mutants used in the study. Consider justifying the choice and exploring the possibility of performing an alanine (ALA) scan for a more comprehensive mutational analysis.  

      The selection of the residues to be mutated along the cleft was first based on their presence in the proposed cleft and the direct interaction of the compounds with them, either by hydrogen bonding or by hydrophobic interactions. Secondly, all mutated residues did not belong to any of the critical residues involved in transmitting the signal generated by the interaction of CXCL12 with the receptor. In any case, mutants producing a non-functional CXCR4 at the cell membrane were discarded after FACS analysis and chemotaxis experiments. Finally, the length and nature of the resulting mutations were designed mainly to occlude the cleft in case of the introduction of long residues such as lysines (I204K, L208K) or to alter hydrophobic interactions by changing the carbon side chain composition of the residues in the cleft. Indeed, we agree that the alanine scan mutation analysis would have been an alternative strategy to evaluate the residues involved in the interactions of the compounds. 

      (33) Reevaluate the statement regarding the relevance of the Y256F muta7on for the binding of AGR1.137. If there is a significant impact on migra7on in the mutant (Fig. 6B), elaborate on the significance in the context of AGR1.137 binding.

      In the revised discussion we provide more detail on the relevance of Y256F mutation for the binding of AGR1.137 as well as for the partial effect of G207I and R235L mutations. The predicted interactions for each compound are depicted in new Fig. 6 C, D after LigPlot+ analysis (Laskowski R.A. & Swindells M.B. J. Chem. Inf. Model. 2011), showing that AGR1.135 interacted directly with the receptor through a hydrogen bond with Y256. When this residue was mutated to F, one of the anchor points for the compound was lost, weakening the potential interaction in the region of the upper anchor point.

      It is not clear how the Y256F mutation will affect the binding of AGR1.137, but other potential contacts cannot be ruled out since that portion of the compound is identical in both AGR1.135 and AGR1.137. This is especially true for its neighboring residues in the alpha helix, F249, L208, as shown in 3ODU structure (Fig. 6D), which are shown to be directly implicated in the interaction of both compounds. Alternatively, we cannot discard that Y256 interacts with other TMs or lipids stabilizing the overall structure, which could reverse the effect of the mutant at a later stage (Author response image 3).

      Author response image 3.

      Cartoon representation of Y256 and its intramolecular interactions in the CXCR4 Xray solved structure 3ODU. TMV helix is colored in blue and TMVI in pink.

      (34) Address the apparent discrepancy in residue involvement between AGR1.135 and AGR1.137, particularly if they share the same binding mode in the same clef.

      AGR1.135 and AGR1.137 exhibit comparable yet distinct binding modes, engaging with CXCR4 within a molecular cavity formed by TMV and TMVI. AGR1.135 binds to CXCR4 through three hydrogen bonds, two on the apical side of the compound that interact with residues TMV-G207 and TMVI-Y256 and one on the basal side that interacts with TMVI-R235 (Fig. 5A). This results in a more extended and rigid conformation when sharing hydrogen bonds, with both TMs occupying a surface area of 400 Å2 and a length of 20 Å in the cleft between TMV and TMVI (Supplementary Fig. 8A). AGR1.137 exhibits a distinct binding profile, interacting with a more internal region of the receptor. This interaction involves the formation of a hydrogen bond with TMIIIV124, which induces a conformational shift in the TMVI helix towards an active conformation (Fig. 5B; Supplementary Fig. 13). Moreover, AGR1.137 may utilize the carboxyl group of V124 in TMIII and overlap with AGR1.135 binding in the cavity, interacting with the other 19 residues dispersed between TMV and VI to create an interaction surface of 370 Å2 along 20 Å (Supplementary Fig. 8B). This is illustrated in the new Fig. 5B. AGR1.137 lacks the phenyl ring present in AGR1.135, resulting in a shorter compound with greater difficulty in reaching the lower part of TMVI where R235 sits. 

      Author response image 4.

      AGR1.135 and AGR1.137 interaction with TMV and TMVI.  The model shows the location of the compounds within the TMV-VI cleft, illustrated by a ribbon and stick representation. The CXCR4 segments of TMV and TMVI are represented in blue and pink ribbons respectively, and side chains for some of the residues defining the cavity are shown in sticks. AGR1.135 and AGR1.137 are shown in stick representation with carbon in yellow, nitrogen in blue, oxygen in red, and fluorine in green. Hydrogen bonds are indicated by dashed black lines, while hydrophobic interactions are shown in green. The figure reproduces the panels A, B of Fig. 5 in the revised manuscript.

      (35) In the Results sec7on regarding "AGR1.137 treatment in a zebrafish xenograf model", the following points can be refined for clarity and completeness: 1. Cell Line Choice for Zebrafish Xenograft Model:

      -Explain the rationale behind the choice of HeLa cells for the zebrafish xenograft model when the previous experiments primarily focused on Jurkat cells. Address any specific biological or experimental considerations that influenced this decision.

      As far as we know, there are no available models of tumors in zebrafish using Jurkat cells. We looked for a tumoral cell system that expresses CXCR4 and could be transplanted into zebrafish. HeLa cells are derived from a human cervical tumor, express a functional CXCR4, and have been previously used for tumorigenesis analyses in zebrafish (Brown H.K. et al. Expert Opin. Drug Discover. 2017; You Y. et al Front. Pharmacol. 2020). These cells grow in the fish and disseminate through the ventral area and can be used to determine primary tumor growth and metastasis. Nonetheless, we first analyzed in vitro the expression of a functional CXCR4 in these cells (Supplementary Fig. 10A), whether AGR1.137 treatment specifically abrogated CXCL12-mediated direct cell migration (Fig. 7A, B), as whether it affected cell proliferation (Supplementary Fig. 10B). As HeLa cells reproduce the in vitro effects detected for the compounds in Jurkat cells, we used this model in zebrafish. These issues were already discussed in the first version of our manuscript. 

      (36) 2. Toxicity Assessment in Zebrafish Embryos: 

      -Clarify the basis for stating that AGR1.137 is not toxic to zebrafish embryos. Consider referencing the Zebrafish Embryo Acute Toxicity Test (ZFET) and provide relevant data on lethal concentration (LC50) and non-lethal toxic phenotypes such as pericardial edema, head and tail necrosis, malformation, brain hemorrhage, or yolk sac edema.

      Tumor growth and metastasis kinetics within the zebrafish model have been extensively evaluated in many publications (White R. et al. Nat. Rev. Cancer. 2013; Astell K.R. and Sieger D. Cold Spring Harb. Perspect. Med. 2020; Chen X. et al. Front. Cell Dev. Biol. 2021; Weiss JM. Et al. eLife 2022; Lindhal G. et al NPJ Precis. Oncol. 2024). Our previous experience using this model shows that tumors start having a more pronounced proliferation and lower degree of apoptosis from day 4 onwards, but we cannot keep the tumor-baring larvae for that long due to ethical reasons and also because we don’t see much scientific benefit of unnecessarily extending the experiments. Anti-proliferative or pro-apoptotic effects of drugs can still be observed within the three days, even if this is then commonly seen as larger reduction (instead of a smaller growth as it is commonly seen in for example mouse tumor models) compared to controls. Initially we characterized the evolution of implanted tumors in our system and how much they metastasize over time in the absence of treatment before to test the compounds (Author response image 5).

      The in vivo experiments were planned to validate efficacious concentrations of the investigated drugs rather than to derive in vivo IC50 or other values, which require testing of multiple doses. We have, however, included an additional concentration to show concentration-dependence and therefore on-target specificity of the drugs in the revised version of the manuscript (data also being elaborated in ongoing experiments). At this stage, we believe that adding the LC50 does not provide interesting new knowledge, and it is standard to only show results from the experimental endpoint (in our case 3 days post implantation). We agree that showing these new data points strengthens the manuscript and facilitates independent evaluation and conclusions to be drawn from the presented data. We have created new graphs where datapoints for each compound dose are shown.  

      Author response image 5.

      Evolution of the tumors and metastasis along the time in the absence of any treatment. HeLa cells were labeled with 8 µg/mL Fast-DiI™ oil and then implanted in the dorsal perivitelline space of 2-days old zebrafish embryos. Tumors were imaged within 2 hours of implantation and re-imaged each 24 h for three days. Changes in tumor size was evaluated as tumor area at day 1, 2 and 3 divided by tumor area at day 0, and metastasis was evaluated as the number of cells disseminated to the caudal hematopoietic plexus at day 1, 2 and 3 divided by the number of cells at day  3.

      Regarding the statement that AGR1.137 was not toxic, this was based on visual inspection of the zebrafish larvae at the end of the experiment, which also revealed a lack of drug-related mortality in these experiments. There are a number of differences in how our experiment was run compared with the standardized ZFET. ZFET evaluates toxicity from 0 hours post-fertilization to 1 or 2 days post-fertilization, whereas here we exposed zebrafish from 2 days post-fertilization to 5 days post-fertilization. The ZFET furthermore requires that the embryos are raised at 26ºC whereas kept the temperature as close as possible to a physiologically relevant temperature for the tumor cells (36ºC). In the ZFET, embryos are incubated in 96-well plates whereas for our studies we required larger wells to be able to manipulate the larvae and avoid well edge-related imaging artefacts, and we therefore used 24-well plates. As such, the ZFET was for various reasons not applicable to our experimental settings. As we were not interested in rigorously determining the LD50 or other toxicity-related measurements, as our focus was instead on efficacy and we found that the targeted dose was tolerated, we did not evaluate multiple doses, including lethal doses of the drug, and are therefore not able to determine an LD50/LC50. We also did not find drug-induced non-lethal toxic phenotypes in this study, and so we cannot elaborate further on such phenotypes other than to simply state that the drug is well tolerated at the given doses. Therefore, the reference to ZFET in the manuscript was eliminated.

      (37) If supplementary information is available, consider providing it for a comprehensive understanding of toxicity assessments. 

      The effective concentration used in the zebrafish study was derived from the in vitro experiments. That being said, and as elaborated in our response to comment 36, we have added data for one additional dose to show the dose-dependent regulation of tumor growth and metastasis. 

      (38) 3. Optimization and Development of AGR1.137: 

      -Justify the need for further optimization and development of AGR1.137 if it has a comparable effect to AMD3100. Explain the specific advantages or improvements that AGR1.137 may offer over AMD3100. 

      AGR1.137 is highly hydrophobic and is very difficult to handle, particularly in in vivo assays; thus, for the negative allosteric modulators to be used clinically, it would be very important to increase their solubility in water. Contrastingly, AMD3100 is a water-soluble compound. Before using the zebrafish model, we performed several experiments in mice using AGR1.137, but the inhibitory results were highly variable, probably due to its hydrophobicity. We also believe that it would be important to increase the affinity of AGR1.137 for CXCR4, as the use of lower concentrations of the negative allosteric modulator would limit potential in vivo side effects of the drug. On the other hand, we are also evaluating distinct administration alternatives, including encapsulation of the compounds in different vehicles. These alternatives may also require modifications of the compounds. 

      AMD3100 is an orthosteric inhibitor and therefore blocks all the signaling cascades triggered by CXCL12. For instance, we observed that AMD3100 treatment blocked CXCL12 binding, cAMP inhibition, calcium flux, cell adhesion and cell migration (Fig. 3, Fig. 4), whereas the effects of AGR1.137 were restricted to CXCL12-mediated directed cell migration. Although AMD3100 was well tolerated by healthy volunteers in a singledose study, it also promoted some mild and reversible events, including white blood cells count elevations and variations of urine calcium just beyond the reported normal range (Hendrix C.W. et al. Antimicrob. Agents Chemother. 2000). To treat viral infections, continuous daily dosing requirements of AMD3100 were impractical due to severe side effects including cardiac arrhythmias (De Clercq E. Front Immunol. 2015). For AMD3100 to be used clinically, it would be critical to control the timing of administration. In addition, side effects after long-term administration have potential problems. Shorter-term usage and lower doses would be fundamental keys to its success in clinical use (Liu T.Y. et al. Exp. Hematol. Oncol. 2016). The use of a negative allosteric modulator that block cell migration but do not affect other signaling pathways triggered by CXCL12 would be, at least in theory, more specific and produce less side effects. These ideas have been incorporated into the revised discussion to reflect potential advantages or improvements that AGR1.137 may offer over AMD3100.

      (39) 4. Discrepancy in AGR1.137 and AMD3100 Effects:

      -Discuss the observed discrepancy where AGR1.137 exhibits similar effects to AMD3100 but only after 48 hours. Provide insights into the temporal dynamics of their actions and potential implications for the experimental design.

      Images and data shown in Fig. 7E, F correspond to days 0 and 3 after HeLa cell implantation (tumorigenesis) and only to day 3 in the case of metastasis data. The revised version contains the effect of two distinct doses of the compounds (10 and 50 µM, for AGR1.135 and AGR1.137 and 1 and 10 µM for AMD3100). 

      (40) In the "Discussion" section, there are several points that require clarifica7on and refinement to enhance the overall coherence and depth of the analysis:  1. Reduction of Side-Effects: 

      -Provide a more detailed explanation of how the identified compounds, specifically AGR1.135 and AGR1.137, contribute to the reduction of side effects. Consider discussing specific mechanisms or characteristics that differentiate these compounds from existing antagonists.

      The sentence indicating that AGR1.135 and AGR1.137 contribute to reduce side effects is entirely speculative, as we have no experimental evidence to support it. We have therefore corrected this in the revised version. The origin of the sentence was that orthosteric antagonists typically bind to the same site as the endogenous ligand, thus blocking its interaction with the receptor. Therefore, orthosteric inhibitors (i.e. AMD3100) block all signaling cascades triggered by the ligand and therefore their functional consequences. However, the compounds described in this project are essentially negative allosteric modulators, that is, they bind to a site distinct from the orthosteric site, inducing a conformational change in the receptor that does not alter the binding of the endogenous ligand, and therefore block some specific receptor-associated functions without altering others. We observed that AGR1.137 blocked receptor oligomerization and directed cell migration whereas CXCL12 still bound CXCR4, triggered calcium mobilization, did not inhibit cAMP release or promoted receptor internalization. This is why we speculated on the limitation of side effects. The statements have been nonetheless revised in the new version of the manuscript.

      (41) 2. Binding Site Clarification:

      -Address the apparent discrepancy between docking the small compounds in a narrow cleft formed by TMV and TMVI helices and the statement that AGR1.131 binds elsewhere. Clarify the rationale behind this assertion

      After the in silico screening, a total of 40 compounds were selected.  These compounds showed distinct degrees of interaction with the cleft formed by TMV and TMVI and even with other potential interaction sites on CXCR4, with the exception of the ligand binding site according to the data described by Wescott et al. (PNAS 2016 113:9928-9933), as this possibility was discarded in the initial approach of the in silico screening. According to PELE analysis, AGR1.131 was one of the 40 selected compounds that showed a pose with low binding energy, -39.8 kcal/mol, between TMV and TMVI helices, that is, it might interact with CXCR4 through the selected area for the screening. It nonetheless also showed a best pose placed between helices TMI and TMVII, -43.7 kcal/mol. In any case, the compound was included in the biological screening, where it was unable to impact CXCL12-mediated chemotaxis (Fig. 1B). We then focused on AGR1.135 and AGR1.137, as showed a higher inhibitory effect on CXCL12-mediated migration, and on AGR1.131 as an internal negative control. AGR1.131 has a skeleton very similar to the other compounds (Fig. 1C) and can interact with the TM domains of CXCR4 without promoting effects. None of the three compounds affected CXCL12 binding, or CXCL12mediated inhibition of cAMP release, or receptor internalization. However, whereas AGR1.135 and AGR1.137, blocked CXCL12-mediated CXCR4 oligomerization and directed cell migration towards CXCL12 gradients, AGR1.131 had no effect in these experiments (Fig. 3, Fig.  4). 

      Next, we performed additional theoretical calculations (PELE, docking, MD) to inspect in detail the potential binding modes of active and inactive molecules. Based on these additional calculations, we identified that whereas AGR1.135 and AGR1.137 showed preferent binding on the molecular pocket between TMV and TMVI, the best pose for AGR1.131 was located between TMI and TMVII, as the initial experiments indicated.  These observations and data have been clarified in the revised discussion. 

      (42) 3. Impact of Chemical Modifications:

      -Discuss the consequences of the distinct chemical groups in AGR1.135, AGR1.137, and AGR1.131, specifically addressing how variations in amine length and chemical nature may influence binding affinity and biological activity. Provide insights into the potential effects of these modifications on cellular responses and the observed outcomes in zebrafish. 

      The main difference between AGR1.131 and the other two compounds is the higher flexibility of AGR1.131 due to the additional CH2 linker, together with the lack of a piperazine ring. The additional CH2 linking the phenyl ring increases the flexibility of AGR1.131 when compared with AGR1.135 and AGR1.137, and the absence of the piperazine ring might be responsible for its lack of activity, as it makes this compound able to bind to CXCR4 (Fig. 1C).

      AGR1.137 was chosen in a second round. The additional presence of the tertiary amine (in the piperazine ring) allows the formation of quaternary ammonium salts in the aqueous medium and its substituents to increase its solubility (Fig 1C). This characteristic might be related to the absence of toxic effects of the compound in the zebrafish model.

      (43) 4. Existence of Distinct CXCR4 Conformational States: 

      -Provide more detailed support for the statement suggesting the "existence of distinct CXCR4 conformational states" responsible for activating different signaling pathways. Consider referencing relevant studies or experiments that support this claim.

      Classical models of GPCR allostery and activation, which describe an equilibrium between a single inactive and a single signaling-competent active conformation, cannot account for the complex pharmacology of these receptors. The emerging view is that GPCRs are highly dynamic proteins, and ligands with varying pharmacological properties differentially modulate the balance between multiple conformations.

      Just as a single photograph from one angle cannot capture all aspects of an object in movement, no one biophysical method can visualize all aspects of GPCR activation. In general, there is a tradeoff between high-resolution information on the entire protein versus dynamic information on limited regions. In the former category, crystal and cryo-electron microscopy (cryoEM) structures have provided comprehensive, atomic-resolution snapshots of scores of GPCRs both in inactive and active conformations, revealing conserved conformational changes associated with activation. However, different GPCRs vary considerably in the magnitude and nature of the conformational changes in the orthosteric ligand-binding site following agonist binding (Venkatakrishnan A.J.V. et al. Nature 2016). Spectroscopic and computational approaches provide complementary information, highlighting the role of conformational dynamics in GPCR activation (Latorraca N.R.V. et al. Chem. Rev 2017). In the absence of agonists, the receptor population is typically dominated by conformations closely related to those observed in inactive-state crystal structures (Manglik A. et al. Cell 2015). While agonist binding drives the receptor population towards conformations similar to those in activestate structures, a mixture of inactive and active conformations remains, reflecting “loose” or incomplete allosteric coupling between the orthosteric and transducer pockets (Dror R.O. et al. Proc. Natl. Acad. Sci. USA 2011). Surprisingly, for some GPCRs, and under some experimental conditions, a substantial fraction of unliganded receptors already reside in an active-like conformation, which may be related to their level of basal or constitutive signaling (Staus D.P. et al. J. Biol. Chem. 2019);  Ye L. et al. Nature 2016).  In our case, the negative allosteric modulators, (Staus DP, et al. J. Biol. Chem 2019); Ye L. et al. Nature 2016) did not alter ligand binding and had only minor effects on specific CXCL12-mediated functions such as inhibition of cAMP release or receptor internalization, among others, but failed to regulate CXCL12-mediated actin dynamics and receptor oligomerization. Collectively, these data suggest that the described compounds alter the active conformation of CXCR4 and therefore support the presence of distinct receptor conformations that explain a partial activation of the signaling cascade.

      All these observations are now included in the revised discussion of the manuscript.

      (44) 5. Equilibrium Shift and Allosteric Ligands: 

      -Clarify the statement about "allosteric ligands shifting the equilibrium to favor a particular receptor conformation". Support this suggestion with references or experimental evidence

      In a previous answer (see our response to point 2), we explain why we define the compounds as negative allosteric modulators. These compounds do not bind the orthosteric binding site or a site distinct from the orthosteric site that alters the ligand-binding site. Their effect should be due to changes in the active conformation of CXCR4, which allow some signaling events whereas others are blocked. Our functional data thus support that through the same receptor the compounds separate distinct receptor-mediated signaling cascades, that is, our data suggest that CXCR4 has a conformational heterogeneity. It is known that GPCRs exhibit more than one “inactive” and “active” conformation, and the endogenous agonists stabilize a mixture of multiple conformations. Biased ligands or allosteric modulators can achieve their distinctive signaling profiles by modulating this distribution of receptor conformations. (Wingler L.M. & Lefkowitz R.J. Trends Cell Biol. 2020). For instance, some analogs of angiotensin II do not appreciably activate Gq signaling (e.g., increases in IP3 and Ca2+) but still induce receptor phosphorylation, internalization, and mitogen-activated protein kinase (MAPK) signaling (Wei H, et al. Proc. Natl. Acad. Sci. USA 2003). Some of these ligands activate Gi and G12 in bioluminescence resonance energy transfer (BRET) experiments (Namkung Y. et al. Sci. Signal. 2018). A similar observation was described in the case of CCR5, where some chemokine analogs promoted G protein subtype-specific signaling bias (Lorenzen E. et al. Sci. Signal 2018). Structural analysis of distinct GPCRs in the presence of different ligands vary considerably in the magnitude and nature of the conformational changes in the orthosteric ligand-binding site following agonist binding (Venkatakrishnan A.J.V. et al. Nature 2016). Yet, these changes modify conserved motifs in the interior of the receptor core and induce common conformational changes in the intracellular site involved in signal transduction. That is, these modifications might be considered distinct receptor conformations. 

      The revised discussion contains some of these interpretations to support our statement about the stabilization of a particular receptor conformation triggered by the negative allosteric modulators. 

      (45) 6. Refinement of Binding Mode: 

      -Clarify the workflow for obtaining the binding mode, particularly the role of GLIDE and PELE. Clearly explain how these software tools were used in tandem to refine the binding mode. 

      The computational sequential workflow applied in this project included, i) Protein model construction, ii) Virtual screening (Glide), iii) PELE, iv) Docking (AutoDock and Glide) and v) Molecular Dynamics (AMBER).

      Glide was applied for the structure-based virtual screening to explore which compounds could fit and interact with the previously selected binding site.

      After the identification of theoretically active compounds (modulators of CXCR4), additional calculations were done to identify a potential binding site. PELE was used in this sense, to study how the compounds could bind in the whole surface of the target (TMV-TMVI). By applying PELE, we avoided biasing the calculation, and we found that the trajectories with better interaction energies identified the cleft between TMV and TMVI as the binding site for AGR1.135 and AGR1.137, and not for AGR1.131. AGR1.131 showed a pose with low binding energy, -39.8 kcal/mol, between TMV and TMVI helices, that is, it might interact with CXCR4 in the selected area for the screening. But it also showed a better pose placed between helices TMI and TMVII, - 43.7 kcal/mol (see our response to point 41). These data have been now confirmed using Schrodinger’s MM-GBSA procedure (see our response to points 6 and 8). In any case, the compound was included in the biological screening, where it was unable to affect CXCL12-mediated chemotaxis (Fig. 1B). Docking and MD simulations were then performed to study and refine the specific binding mode in this cavity. These data were important to choose the mutations on CXCR4 required, to test whether the compounds reversed its behavior. In these experiments we also confirmed that AGR1.131 had a better pose on the TMI-TMVII region. 

      (46) 7. Impact of Compound Differences on CXCR4-F249L mutant: 

      -Provide visual aids, such as figures, and additional experiments to support the statement about differences in the behavior of AGR1.135 and AGR1.137 on cells expressing CXCR4-F249L mutant. Elaborate on the closer interaction suggested between the triazole group of AGR1.137 and the F249 residue

      At the reviewer’s suggestion, Fig. 5 has been modified to incorporate a closer view of the interactions identified and new panels in new Fig. 6 have been added to show in detail the effect of the mutations selected on the structure of the cleft between TMV and TMVI. The main difference between AGR1.135 and AGR1.137 is how the triazole group interacts with F249 and L216 (Author response image 6). In AGR1.137, the three groups are aligned in a parallel organization, which appears to be more effective: This might be due to a better adaptation of this compound to the cleft since there is only one hydrogen bond with V124. In AGR1.135, the compound interacts with the phenyl ring of F249 and has a stronger interaction at the apical edge to stabilize its position in the cleft. However, there is still an additional interaction present. When changing F249

      Author response image 6.

      Cartoon representation of the interaction of CXCR4 F249L mutant with AGR1.135 (A) and AGR1.137 (B). The two most probable conformations of Leucine rotamers are represented in cyan A and B conformations. Van der Waals interactions are depicted in blue cyan dashed lines, hydrogen bonds in black dashed lines. CXCR4 segments of TMV and TMVI are colored in blue and pink, respectively

      to L (Fig. VIIA, B, only for review purposes) and showing the two most likely rotamers resulting from the mutation, it is observed that rotamer B is in close proximity to the compound, which may cause the binding to either displace or adopt an alternative conformation that is easier to bind into the cleft. As previously mentioned, it is likely that AGR1.135 can displace the mutant rotamer and bind into the cleft more easily due to its higher affinity.

      (47) In the "Materials and Methods" section, the computational approach for the "discovery of CXCR4 modulators" requires significant revision and clarification. The following suggestions aim to address the identified issues: 1. Structural Modeling: 

      -Reconsider the use of SWISS-MODEL if there is an available PDB code for the entire CXCR4 structure. Clearly articulate the rationale for choosing one method over the other and explain any limitations associated with the selected approach. 

      The SWISS-model server allows for automated comparative modeling of 3D protein structures that was pioneered in the fields of automated modeling. At the time we started this project. it was the most accurate method to generate reliable 3D protein structure models.

      As explained above, we have now predicted the structure of the target using AlphaFold (Jumper J. et al, Nature 2021) and performed several additional experiments that confirm that the small compounds bind the selected pocket as the original strategy indicated (see our response to point 6). (Fig. II, only for review purposes).

      (48) 2. Parametriza7on of Small Compounds: 

      -Provide a detailed description of the parametrization process for the small compounds used in the study. Specify the force field and parameters employed, considering the obsolescence of AMBER14 and ff14SB. Consider adopting more contemporary force fields and parameterization strategies. 

      When we performed these experiments, some years ago, the force fields applied (ff14SB, AMBER14 used in MD or OPLS2004 in docking with Glide) were well accepted and were gold standards. It is, however, true that the force fields have evolved in the past few years, Moreover, in the case of the MD simulations, to consider the parameters of the ligands that are not contained within the force field, we performed an additional parameterization as a standard methodology. We then generated an Ab initio optimization of the ligand geometry, defining as basis sets B3LYP 6-311+g(d), using Gaussian 09, Revision A.02, and then a single point energy calculation of ESP charges, with HF 6311+g(d) on the optimized structure. As the last step of the parametrization, the antechamber module was used to adapt these charges and additional parameters for MD simulations.

      (49) 3. Treatment of Lipids and Membrane: 

      -Elaborate on how lipids were treated in the system. Clearly describe whether a membrane was included in the simulations and provide details on its composition and structure. Address the role of the membrane in the study and its relevance to the interactions between CXCR4 and small compounds 

      To stabilize CXCR4 and more accurately reproduce the real environment in the MD simulation, the system was embedded in a lipid bilayer using the Membrane Builder tool (Sunhwan J. et al. Biophys. J. 2009) from the CHARMM-GUI server. The membrane was composed of 175 molecules of the fatty acid 1-palmitoyl-2-oleoyl-sn-glycero-3phosphocholine (POPC) in each leaflet. The protein-membrane complex was solvated with TIP3 water molecules. Chloride ions were added up to a concentration of 0.15 M in water, and sodium ions were added to neutralize the system. This information was previously described in detail.

      (50) 4. Molecular Dynamics Protocol: 

      -Provide a more detailed and coherent explanation of the molecular dynamics protocol. Clarify the specific steps, parameters, and conditions used in the simulations. Ensure that the protocol aligns with established best practices in the field.

      Simulations were calculated on an Asus 1151 h170 LVX-GTX-980Ti workstation, with an Intel Core i7-6500 K Processor (12 M Cache, 3.40 GHz) and 16 GB DDR4 2133 MHz RAM, equipped with a Nvidia GeForce GTX 980Ti available for GPU (Graphics Processing Unit) computations. MD simulations were performed using AMBER14 (Case D.A. et al. AMBERT 14, Univ. of California, San Francisco, USA, 2014) with ff14SB (Maier J.A. et al. J. Chem. Theory Comput. 2015) and lipid14 (Dickson C. J. et al. J. Chem. Theory Comput. 2014) force fields in the NPT thermodynamic ensemble (constant pressure and temperature). Minimization was performed using 3500 Steepest Descent steps and 4500 Conjugate Gradient steps three times, firstly considering only hydrogens, next considering only water molecules and ions, and finally minimizing all atoms. Equilibration raises system temperature from 0 to 300 K at a constant volume fixing everything but ions and water molecules. After thermalization, several density equilibration phases were performed. In the production phase, 50 ns MD simulations without position restraints were calculated using a time step of 2 fs. Trajectories of the most interesting poses were extended to 150 ns. All bonds involving hydrogen atoms were constrained with the SHAKE algorithm (Lippert R.A. et al. J. Chem. Phys. 2007). A cutoff of 8 Å was used for the Lennard-Jones interaction and the short-range electrostatic interactions. Berendsen barostat (Berendsen H.J. et al. J. Chem. Phys.  1984) and Langevin thermostat were used to regulate the system pression and temperature, respectively. All trajectories were processed using CPPTRAJ (Roe D.R. & Cheatham III T.E. J. Chem. Theory Comput. 2013) and visualized with VMD (Visual Molecular Dynamics) (Humphrey W. et al. J. Mol. Graphics. 1996). To reduce the complexity of the data, Principal Component Analysis (PCA) was performed on the trajectories using CPPTRAJ.

      (51) Consider updating the molecular dynamics protocol to incorporate more contemporary methodologies, considering advancements in simulation techniques and software.

      In our answer to points 6 and 47, we describe why we use the technology based on Swiss-model and PELE analysis and how we have now used Alphafold and other more contemporary methodologies to confirm that the small compounds bind the selected pocket.

      (52) Figure 1A: 

      •  Consider switching to a cavity representation for CXCL12 to enhance clarity and emphasize the cleft.

      Fig. 1A has been modified to emphasize the cleft.

      (53) Explicitly show the TMV-TMVI cleft in the figure for a more comprehensive visualization. 

      In Fig. 1A we have added an insert to facilitate TMV-TMVI visualization.

      (54) Figure 1B: 

      •  Clearly explain the meaning of the second DMSO barplot to avoid confusion. 

      To clarify this panel, we have modified the figure and the figure legend. Panel B now includes a complete titration of the three compounds analyzed in the manuscript.  The first bar shows cell migration in the absence of both treatment with AMD3100 and stimulation with CXCL12.  The second bar shows migration in response to CXCL12 in the absence of AMD3100. The third bar shows the effect of AMD3100 on CXCL12-induced migration, as a known control of inhibition of migration.  We hope that this new representation of the data results is clearer.

      (55) Figure 1C: 

      •  Provide a clear legend explaining the significance of the green shading on the small compounds. 

      The legend for Fig. 1C has been modified accordingly to the reviewer’s suggestion.

      (56) Figure 2: 

      •  Elaborate on the role of fibronectin in the experiment and explain the specific contribution of CD86-AcGFP.

      The ideal situation for TIRF-M determinations is to employ cells on a physiological substrate complemented with or without chemokines. Fibronectin is a substrate widely used in different studies that allows cell adhesion, mimicking a physiological situation. Jurkat cells express alpha4beta1 and alpha5beta1 integrins that mediate adhesion to fibronectin (Seminario M.C. et al. J. Leuk. Biol. 1999).

      Regarding the use of CD86-AcGFP in TIRF-M experiments. We currently determine the number of receptors in individual trajectories of CXCR4 using, as a reference, the MSI value of CD86-AcGFP that strictly showed a single photobleaching step (Dorsch S. et al. Nat Methods 2009).

      We preferred to use CD86-AcGFP in cells instead of AcGFP on glass, to exclude any potential effect on the different photodynamics exhibited by AcGFP when bound directly to glass. In any case, this issue has been clarified in the revised version.

      (57) Figure 3D: 

      •  Include a plot for the respective band intensity to enhance data presentation 

      The plot showing the band intensity analysis of the experiments shown in Fig. 3D was already included in the original version (see old Supplementary Fig. 3). However, in the revised version, we include these plots in the same figure as panels 3E and 3F.  As a control of inhibition of CXCL12 stimulation, we have also included a new figure (Supplementary Fig. 4) showing the effect of AMD3100 on CXCL12-induced activation of Akt and ERK as analyzed by western blot.

      (58) Consider adding AMD3100 as a control for comparison. 

      In agreement with the reviewer’s suggestion, we have added the effect of AMD3100 in most of the functional experiments performed.

      (59) Figure 4: 

      •  Address the lack of positive controls in Figure 4 and consider their inclusion for a more comprehensive analysis. 

      DMSO bars correspond to the control of the experiment, as they represent the effect of CXCL12 in the absence of any allosteric modulator. As previously described in this point-by-point reply, DMSO bars correspond to the control performed with the solvent with which the small compounds, at maximum concentration, are diluted.  Therefore, they show the effect of the solvent on CXCL12 responses. In any case, and in order to facilitate the comprehension of the figure we have also added the controls in the absence of DMSO to demonstrate that the solvent does not affect CXCL12-mediated functions, together with the effect of the orthosteric inhibitor AMD3100. In addition, we have also included representative images of the effect of the different compounds on CXCL12-induced polarization (Fig. 4C).

      (60) In Figure 4A, carefully assess overlapping error bars and ensure accurate interpreta7on. If necessary, consider alternative representation. 

      We have tried alternative representations of data in Fig. 4A, but in all cases the figure was unclear. We believe that the way we represent the data in the original manuscript is the most clear and appropriate.  Nevertheless, we have now included significance values as a table annexed to the figure, as well as the effect of AMD3100, as a control of inhibition

      (61) Supplementary Figure 1A: 

      •  Improve the clarity of bar plots for better understanding. Consider reordering them from the most significant to the least. 

      This was a good idea, and therefore Supplementary Fig. 1A has been reorganized to improve clarity.

      (62) Supplementary Figure 1C: 

      •  Clarify the rationale behind choosing the 12.5 nM concentration and explain if different concentrations of CXCL12 were tested. 

      In old Supplementary Fig. 1C, we used untreated cells, that is, CXCL12 was not present in the assay.  These experiments were performed to test the potential toxicity of DMSO (solvent) or the negative allosteric modulators on Jurkat cells. The 12.5 nM concentration of CXCL12 mentioned in the figure legend applied only to panels A and B, as indicated in the figure legend. We previously optimized this concentration for Jurkat cells using different concentrations of CXCL12 between 5 and 100 nM.  Nevertheless, we have reorganized old supplementary fig. 1 and clarified the figure legend to avoid misinterpretations (see Supplementary Fig 1A, B and Supplementary Fig. 2A, B).

      (63) Explain the observed reduction in fluorescence intensity for AGR1.135. 

      The cell cycle analysis has been moved from Supplementary Fig. 1C to a new Supplementary Fig. 2.  It now includes the flow cytometry panels to show fluorescence intensity as a function of the number of cells analyzed (Panel 1A) as well as a table (panel B) with the percentage of cells in each phase of the cell cycle. We believe that the apparent reduction in fluorescence that the reviewer observes is mainly due to the number of events analyzed. However, we have changed the flow cytometry panels for others that are more representative and included a table with the mean of the different results. When we determined the percentage of cells in each cell cycle phase, we observed that it looks very similar in all the experimental conditions. That is, none of the compounds affected any of the cell cycle phases. We have also included the effect of H2O2 and staurosporine as control compounds inducing cell death and cell cycle alteration of Jurkat cells.

      (64) Supplementary Table 1: 

      •  Include a column specifying the scoring for each compound to provide a clear reference for readers. 

      To facilitate references to readers, we have now included the inhibitory effect of each compound on Jurkat cell migration in the revised version of this table. 

      (65) Minor Points 

      Page 2 - Abstract: Rephrase the first sentence of the abstract to enhance fluidity. 

      Although the entire manuscript was revised by a professional English editor, we appreciate the valuable comments of this reviewer and we have corrected these issues accordingly.

      (66) Page 2 - Abstract: Explicitly define "CXCR4" as "C-X-C chemokine receptor type 4" the first time it appears.

      We have not used C-X-C chemokine receptor type 4 the first time it appears in the abstract. CXCR4 is an acronym normally accepted to identify this chemokine receptor, and it is used as CXCR4 in many articles published in eLife. However, we introduce the complete name the first time it appears in the introduction.

      (67) Page 2 - Abstract: Explicitly define "CXCL12" as "C-X-C motif chemokine 12" the first time it is mentioned. 

      As we have discussed in the previous response, we have not used C-X-C motif chemokine 12 the first time CXCL12 appears in the abstract, as it is a general acronym normally accepted to identify this specific chemokine, even in eLife papers. However, we introduce the complete name the first time it appears in the introduction section.

      (68) Page 2 - Abstract: Explicitly define "TMV and TMVI" upon its first mention.

      The acronym TM has been defined as “Transmembrane” in the revised version

      (69) Page 2 - Abstract: Review the use of "in silico" in the sentence for accuracy and consider revising if necessary.

      With the term “in silico” we want to refer to those experiments performed on a computer or via computer simulation software. We have carefully reviewed its use in the new version of the manuscript.

      (70) Page 2 - Abstract: Add a comma after "compound" in the sentence, "We identified AGR1.137, a small compound that abolishes...".

      A comma after “compound” has been added in the revised sentence.

      (71) Page 2 - Significance Statement: Rephrase the first sentence of the "Significance Statement" to avoid duplication with the abstract.

      The first sentence of the Significance Statement has been revised to avoid duplication with the abstract. 

      (72) Page 2 - Significance Statement: Break down the lengthy sentence, "Here, we performed in silico analyses..." for better readability. 

      The sentence starting by “Here, we performed in silico analyses…” has been broken down in the revised manuscript.

      (73) Page 2 - Introduction: Replace "Murine studies" with a more specific term for clarity.

      The term “murine studies” is normally used to refer to experimental studies developed in mice. We have nonetheless rephrased the sentence.

      (74) Page 3 - Introduction: Rephrase the sentence for clarity: "Finally, using a zebrafish model, ..."

      The sentence has been now rephrased for clarity.

      (75) Results-AGR1.135 and AGR1.137 block CXCL12-mediated CXCR4 nanoclustering and dynamics: 

      Rephrase the sentence for clarity: "Retreatment with AGR1.135 and AGR1.137, but not with AGR1.131, substantially impaired CXCL12-mediated receptor nanoclustering.”

      The sentence has been rephrased for clarity.

      (76) Results - AGR1.135 and AGR1.137 incompletely abolish CXCR4-mediated responses in Jurkat cells: Clarify the sentence: "In contrast to the effect promoted by AMD3100, a binding-site antagonist of CXCR4..."

      The sentence has been modified for clarity.

      (77) Consider using "orthosteric" instead of "binding-site" antagonist.

      The term orthosteric is now used throughout to refer to a binding site antagonist.

      (78) Discussion: Use the term "in silico" only when necessary.

      We have carefully reviewed the use of “in silico” in the manuscript.

      (79) Discussion: Clarify the sentence: "...not affect neither CXCR2-mediated cell migration...". Confirm if "CXCL12" is intended.

      The sentence refers to the chemokine receptor CXCR2, which binds the chemokine CXCL2. To test the specificity of the compounds for the CXCL12/CXCR4 axis, we evaluated CXCL2-mediated cell migration.  The results indicated that CXCL2/CXCR2 axis was not affected by the negative allosteric modulators, whereas CXCL12-mediated cell migration was blocked.  The sentence has been clarified in the new version of the manuscript.

      (80) Figure 4B: Bold the "B" in the figure label for consistency.

      The “B” in Fig. 4B has been bolded.

      Reviewer #2

      (1) Fig 2. The SPT data is sub-optimal in its presentation as well as analysis. Example images should be shown. The analysis and visualization of the data should be reconsidered for improvements. Graphs with several hundreds, in some conditions over 1000 tracks, per condition are very hard to compare. The same (randomly selected representative set) number of data points should be shown for better visualization. Also, more thorough analyses like MSD or autocorrelation functions are lacking - they would allow enhanced overall representation of the data.

      In agreement with the reviewer’s commentary, we have modified the representation of Fig. 2. We have carefully read the paper published by Lord S.J. and col. (Lord S. J. et al., J. Cell Biol. 2020) and we apply their recommendations for these type of data. We have also included as supplementary material representative videos for the TIRF-M experiments performed to allow readers to visualize the original images. Regarding the MSD analyses, they were developed to determine all D1-4 values. According to the data published by Manzo & García-Parajo (Manzo C. & García-Parajo M.F. Rep.Prog. Phys. 2015) due to the finite trajectory length the MSD curve at large tlag has poor statistics and deviates from linearity. However, the estimation of the Diffusion Coefficient (D1-4) can be obtained by fitting of the short tlag region of the MSD plot giving a more accurate idea of the behavior of particles. In agreement we show D1-4 values and not MSD data. 

      Due to the space restrictions, it is very difficult to include all the figures generated, but, only for review purposes, we included in this point-by-point reply some representative plots of the MSD values as a function of the time from individual trajectories showing different types of motion obtained in our experiments (Author response image 7).

      Author response image 7.

      Representative MSD plots from individual trajectories of CXCR4-AcGFP showing different types of motion: A) confined, B) Brownian/Free, C) direct transport of CXCR4-AcGFP particles diffusing at the cell membrane detected by SPT-TIRF in resting JKCD4 cells.

      Further analysis, such as the classification based on particle motion, has not been included in this article. This classification uses the moment scaling spectrum (MSS), described by Ewers H. et al. 2005 PNAS, and requires particles with longer trajectories (>50 frames). Only for review purposes, we include a figure showing the percentage of the MSS-based particle motion classification for each condition. As expected, most of long particles are confined, with a slight increase in the percentage upon CXCL12 stimulation in all conditions, except in cell treated with AGR1.137 (Author response image 8).

      Author response image 8.

      Effects of the negative allosteric modulators on the Types of Motion of CXCR4. Percentage of single trajectories with different types of motion, classified by MSS (DMSO: 58 particles in 59 cells on FN; 314 in 63 cells on FN+CXCL12; AGR1.131: 102 particles in 71 cells on FN; 258in 69 cells on FN+CXCL12; AGR1.135: 86 particles in 70 cells on FN; 120 in 77 cells on FN+CXCL12; AGR1.137: 47 particles in 66 cells on FN; 74 in 64 cells on FN+CXCL12) n = 3.

      (2) Fig 3. The figure legends have inadequate information on concentrations and incubation times used, both for the compounds and other treatments like CXCL12 and forskolin. For the Western blot data, also the quantification should be added to the main figure. The compounds, particularly AGR1.137 seem to lead to augmented stimulation of pAKT and pERK. This should be discussed

      The Fig. 3 legend has been corrected in the revised manuscript. Fig. 3D now contains representative western blots and the densitometry evaluation of these experiments. As the reviewer indicates, we also detected in the western blot included, augmented stimulation of pAKT and pERK in cells treated with AGR1.137. However, as shown in the densitometry analysis, no significant differences were noted between the data obtained with each compound. As a control of inhibition of CXCL12 stimulation we have included a new Supplementary Fig. 4 showing the effect of AMD3100 on CXCL12-induced activation of Akt and ERK as analyzed by western blot.

      (3) Fig. 4 immunofluorescence data on polarization as well as the flow chamber data lack the representative images of the data. The information on the source of the T cells is missing. Not clear if this experiment was done on bilayers or on static surfaces.

      Representative images for the data shown in Figure 4B have been added in the revised figure (Fig. 4C). The experiments in Fig. 4B were performed on static surfaces. As indicated in the material and methods section, primary T cell blasts were added to fibronectin-coated glass slides and then were stimulated or not with CXCL12 (5 min at 37ºC) prior to fix permeabilize and stain them with Phalloidin. Primary T cell blasts were generated from PBMCs isolated from buffy coats that were activated in vitro with IL-2 and PHA as indicated in the material and methods section.

      (4) The data largely lacks titration of different concentrations of the compounds. How were the effective concentration and treatment times determined? What happens at higher concentrations? It is important to show, for instance, if the CXCR12 binding gets inhibited at higher concentrations. most experiments were performed with 50 uM, but HeLa cell data with 100 uM. Why and how was this determined? 

      The revised version contains a new panel in Fig. 1B to show a more detailed kinetic analysis with different concentrations (1-100 µM) of the compounds in the migration experiments using Jurkat cells. We choose 50 µM for further studies as it was the concentration that inhibits 50-75% of the ligand induced cell migration. 

      We have also included the effect of two doses of the compounds (10 and 50 µM) in the zebrafish model as well as AMD3100 (1 and 10 µM) as control (new Fig. 7D, E).  Tumors were imaged within 2 hours of implantation and tumor-baring embryos were treated with either vehicle (DMSO) alone, AGR1.131 or AGR1.137 at 10 and 50 µM or AMD3100 at 1 and 10 µM for three days, followed by re-imaging.

      Regarding the amount of CXCL12 used in these experiments, with the exception of cell migration assays in Transwells, where the optimal concentration was established at 12.5 nM, in all the other experiments the optimal concentration of CXCL12 employed was 50 nM. In the case of the directional cell migration assays, we use 100 nM to create the chemokine gradient in the device. These concentrations have been optimized in previous works of our laboratory using these types of experiments. It should also be noted that in the experiments using lipid bilayers or TIRF-M experiments, CXCL12 is used to coat the plates and therefore it is difficult to determine the real concentration that is retained in the surface after the washing steps performed prior adding the cells.

      (5) The authors state that they could not detect direct binding of the compounds and the CXCR14. It should be reported what approaches were tried and discussed why this was not possible. 

      We attempted a fluorescence spectroscopy strategy to formally prove the ability of AGR1.135 to bind CXCR4, but this strategy failed because the compound has a yellow color that interfered with the determinations. We also tried a FRET strategy (see supplementary Fig. 7) and detected a significant increase in FRET efficiency of CXCR4 homodimers in cells treated with AGR1.135; this effect was due to the yellow color of this compound that interferes with FRET determinations. In the same assays, AGR1.137 did not modify FRET efficiency for CXCR4 homodimers and therefore we cannot assume that AGR1.137 binds on CXCR4. All these data have been considered in the revised discussion.

      (6) The proliferation data in Supplementary Figure 1 lacks controls that affect proliferation and indication of different cell cycle stages. What is the conclusion of this data? More information on the effects of the drug to cell viability would be important.

      Toxicity in Jurkat cells was first determined by propidium iodide incorporation. Some compounds (i.e., AGR1.103 and VSP3.1) were discarded from further analysis as they were toxic for cells. In a deeper analysis of cell toxicity, even if these compounds did not kill the cells, we checked whether they could alter the cell cycle of the cells. New Supplementary Fig. 2 includes a table (panel B) with the percentage of cells in each cell cycle phase, and no differences between any of the treatments tested were detected. 

      Nevertheless, to clarify this issue the revised version of the figure also includes H2O2 and staurosporine stimuli to induce cell death and cell cycle alterations as controls of these assays.

      (7) The flow data in Supplementary Figure 2 should be statistically analysed. 

      Bar graphs corresponding to the old Supplementary Fig. 2 (new Supplementary Fig. 3) are shown in Fig. 3B. We have also incorporated the corresponding statistical analysis to this figure. 

      (8) In general, the authors should revise the figure legends to ensure that critical details are added. 

      We have carefully revised all the figure legends in the new version of the manuscript.

      (9) Bar plots are very poor in showing the heterogeneity of the data. Individual data points should be shown whenever feasible. Superplot-type of representation is strongly advised (https://doi.org/10.1083/jcb.202001064).

      We have carefully read the paper published by Lord S.J. and col. (Lord S. J. et al., J. Cell Biol. 2020) and we apply their recommendations for our TIRF-M data (see revised

      Fig.  2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Liu et al explores the role of the UPR and immune regulators in the evaluation of nutritional quality in C. elegans. They identify neuronal UPR activation and the MAPK PMK-1 as key responders to low food quality. In particular, the data suggest that these pathways are activated by low levels of vitamin C synthesis that result from the low sugar levels present in heat-killed E. coli.

      Strengths:

      The results are intriguing and expand our understanding both of physiological food evaluation systems, and of the known roles of stress response pathways in organismal physiology. The authors use a range of techniques, encompassing imaging, metabolomic analysis, gene expression analysis, and behavioural assays, to support their claims.

      Thank you for your thorough review and acknowledgment of the strengths of our study.

      Weaknesses:

      There is limited mechanistic analysis in the study. In particular, how does low vitamin C trigger UPR activation? This is an intriguing finding that, if followed up, could potentially reveal a novel mechanism of UPR activation. In addition, how is the activation of the PMK-1 pathway driven by/coordinated with UPR activation? The data in some figures is not as convincing as it could be: the magnitude of the effect size is small in the supplementation experiments, and the statistical tests used are not always appropriate to enable multiple comparisons.

      (1) There is limited mechanistic analysis in the study. In particular, how does low vitamin C trigger UPR activation? This is an intriguing finding that, if followed up, could potentially reveal a novel mechanism of UPR activation. 

      Thank you for highlighting the need for further mechanistic analysis in our study. We appreciate the opportunity to clarify the process by which low vitamin C triggers UPR activation.

      Our investigation revealed that the vitamin C content in heat-killed E. coli (HK-E. coli) is comparable to that of live E. coli or HK-yfbR mutant E. coli (Figure 4-figure supplement 1A), indicating that the induction of unfolded protein response (UPR) in C. elegans by HK-E. coli is not solely attributed to low vitamin C levels but rather involves other unidentified factors.

      Through metabolomic analysis, we observed significant decreases in sugar levels, including lactose, D-(+)-sucrose, and D-(+)-glucose, in HK-E. coli (Figure 3B, Table S1). Notably, supplementing D-(+)-glucose effectively inhibited UPRER, immune response, and avoidance behavior induced by HK-E. coli (Figure 3E-H). These findings suggest that the deficiency in sugars in HK-E. coli triggers a stress response and avoidance behavior in animals, which can be alleviated by D-(+)-glucose supplementation.

      Furthermore, when comparing heat-killed E. coli mutant yfbR (HK-yfbR) to HK-E. coli, we observed significantly higher sugar levels, including lactose and D-(+)-sucrose, in HK-yfbR (Figure 3B). This was accompanied by reduced UPRER in animals feeding on HK-yfbR (Figure 3-figure supplement 1B), indicating that higher sugar levels may inhibit the induction of UPRER by low-quality food.

      Considering that the synthesis of vitamin C (VC) occurs through the glucuronate pathway, utilizing D-glucose as a precursor 1, 2 (Figure 4A), we investigated whether the vitamin C biosynthesis pathway is involved in evaluating low-quality food using D-glucose. Contrary to our initial hypothesis, animals fed live E. coli did not exhibit higher glucose levels compared to those fed low-quality food (HK_-E. coli_). Our results indicate that animals maintain similar VC levels when fed ideal food (live E. coli) compared to low-quality food (HK-E. coli) (Figure 4B), suggesting that animals do not stimulate VC biosynthesis under favorable food conditions. However, supplementation of D-GlcA or E. coli-yfbR mutation in HK-E. coli significantly improved VC levels when animals were fed low-quality food (HK-OP50) (Figure 4B, 4C). Moreover, VC or D-glucuronate (D-GlcA) supplementation inhibited HK-E. coli-induced UPRER (Figure 4D), indicating that glucose boosts the animal's ability to adapt to unfavorable food environments by increasing VC levels, thereby inhibiting UPRER, but not under favorable food conditions.

      These findings shed light on the complex interplay between vitamin C, sugar levels, and UPR activation, providing valuable insights into the mechanisms underlying food evaluation and stress response pathways in organisms.

      Overall, we are grateful for the reviewer's constructive feedback, which motivates us to continue our efforts to understanding how the UPR response contributes to the complexities of food evaluation and behavioral responses in organisms.

      (2) In addition, how is the activation of the PMK-1 pathway driven by/coordinated with UPR activation?

      Thank you for your insightful inquiry. In our discussion section, we have addressed this question by integrating new data and discussion to provide insights into the coordination between PMK-1 pathway activation and UPR activation.

      Previous studies have demonstrated that activating innate immunity, specifically the PMK-1 MAPK pathway, results in a reduction in translation3, as well as a shutdown of food digestion in animals4, likely aimed at reducing protein translation and cellular metabolism. To further investigate this relationship, we measured the translation level of animals fed with heat-killed E. coli (HK-E. coli) and found a significant reduction in total translation ability in these animals (Figure 5-figure supplement 1D). This observation suggests that activating innate immunity through the PMK-1 MAPK pathway may serve as a mechanism to slow down translation progress, thereby alleviating the pressure on the unfolded protein response (UPR) and preventing excessive UPRER activation.

      By integrating these findings, we propose a model wherein activation of the PMK-1 pathway coordinates with UPR activation to regulate translation and cellular metabolism in response to low-quality food. This coordinated response likely serves to maintain cellular homeostasis and prevent detrimental effects associated with excessive UPRER activation.

      These insights contribute to our understanding of the intricate interplay between innate immunity, cellular stress responses, and metabolic regulation in organisms facing nutritional challenges.

      (3) The data in some figures is not as convincing as it could be: the magnitude of the effect size is small in the supplementation experiments, and the statistical tests used are not always appropriate to enable multiple comparisons.

      We appreciate the reviewers' concerns regarding the data presentation and statistical analyses in some of our figures. In response to this feedback, we have made revisions to improve the robustness and clarity of our statistical methods.

      All statistical analyses were conducted using GraphPad Prism 8.0 software. Specifically, a two-tailed unpaired t-test was employed for the statistical analysis of two groups of samples, while one-way or two-way ANOVA was utilized for the statistical analysis of more than two groups of samples. These adjustments ensure appropriate statistical comparisons and enhance the reliability of our findings.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors aim to better understand how C. elegans detects and responds to heat-killed (HK) E. coli, a low-quality food. They find that HK food activates two canonical stress pathways, ER-UPR, and innate immunity, in the nervous system to promote food aversion. Through the creative use of E. coli genetics and metabolomics, the authors provide evidence that the altered carbohydrate content of HK food is the trigger for the activation of these stress responses and that supplementation of HK food with sugars (or their biosynthetic product, vitamin C), reduces stress pathway induction and food avoidance. This work makes a valuable addition to the literature on metabolite detection as a mechanism for the evaluation of nutritional value; it also provides some new insight into the physiologically relevant roles of well-known stress pathways in modulating behavior.

      Strengths:

      -The work addresses an important question by focusing on understanding how the nervous system evaluates food quality and couples this with behavioral change. -The work takes full advantage of the tools available in this powerful system and builds on extensive previous studies on feeding behavior and stress responses in C. elegans.

      -Creative use of E. coli genetics and metabolite profiling enabled the identification of carbohydrate metabolism as a candidate source of food-quality signals.

      -For the most part, the studies are rigorous and logically designed, providing good support for the authors' model.

      We deeply appreciate the reviewer's insightful assessment of our study's strengths. 

      Weaknesses:

      -It is not clear how the mechanism identified here is connected to previously described, related processes. In particular, it is not clear whether this mechanism has a role in the detection of other low-quality foods. Further, the specificity of the ability of sugar/vitamin C to suppress stress pathway induction is unclear (i.e., does sugar/vitamin C have any effect on the activation of these pathways through other means?). Additionally, the relationship of this pathway to the vitamin B2-sensing mechanism previously described by the senior author is unclear. These issues do not weaken confidence in the authors' conclusions, but they do reduce the potential significance of the work.

      (1) In particular, it is not clear whether this mechanism has a role in the detection of other low-quality foods. 

      Thank you for your valuable feedback. In response to your inquiry, we investigated whether the UPRER (IRE-1/XBP-1) - Innate immunity (PMK-1/p38 MAPK) axis is specific to evaluating low-quality food (HK-E. coli) or if it plays a broader role in food detection.

      We conducted behavioral assays using N2, pmk-1, and xbp-1 mutant animals fed with normal E. coli food, inedible food (Saprophytic staphylococci)4, and pathogenic food (Pseudomonas aeruginosa-PA14)5. We found that N2, pmk-1, and xbp-1 mutant worms did not exhibit avoidance behavior when presented with normal food (OP50). However, both N2 and xbp-1 mutant worms were able to escape from inedible food (N2 was predominantly found on the border areas of the bacterial lawn and xbp-1 mutant worms on border and in), Saprophytic staphylococci, whereas pmk-1 mutant worms did not exhibit this avoidance behavior. Notably, N2 and xbp-1 mutant worms exhibited even more pronounced avoidance behavior when exposed to Pseudomonas aeruginosa, whereas pmk-1 mutant worms were more susceptible to infection by this pathogen (Figure 2-figure supplement 2C). These findings suggest that the UPR-Immunity pathway plays a crucial role in helping animals avoid low-quality food (HK-E. coli) by triggering an avoidance response. In contrast, the Innate immunity pathway, mediated by PMK-1/p38 MAPK, appears to play a key role in evaluating unfavorable food sources, such as HK-E. coli, Saprophytic staphylococci, and Pseudomonas aeruginosa, and helping animals avoid these environments.

      (2) Further, the specificity of the ability of sugar/vitamin C to suppress stress pathway induction is unclear (i.e., does sugar/vitamin C have any effect on the activation of these pathways through other means?). 

      Thank you for your inquiry regarding the specificity of the ability of sugar/vitamin C to suppress stress pathway induction. We aimed to address this question by investigating whether high levels of VC inhibit other stress-induced UPRER pathways.

      Previous studies have shown that both Tunicamycin6 and pathogenic bacteria, such as Pseudomonas aeruginosa-PA145, induce UPRER in C. elegans. In response to your query, we conducted experiments to examine whether VC supplementation inhibits UPRER induced by these stressors. Our findings indicate that VC supplementation does not inhibit UPRER induced by either Tunicamycin or PA14 (Author response image 1).

      These results suggest that while sugar/vitamin C may suppress stress pathway induction in the context of low-quality food, its effects may not extend to other stressors that induce UPRER through different mechanisms. This insight helps clarify the specificity of sugar/vitamin C's role in modulating stress pathway activation, contributing to a better understanding of the broader regulatory networks involved in stress response in C. elegans.

      Author response image 1.

      VC supplementation does not inhibit Tunicamycin or PA14-induced UPRER.

      (3) Additionally, the relationship of this pathway to the vitamin B2-sensing mechanism previously described by the senior author is unclear.

      In response to your comment, we would like to clarify the relationship of our pathway to the previously described vitamin B2-sensing mechanism we found. Previous studies have demonstrated that heat-killed E. coli (HK-E. coli) serves as a low-quality food source incapable of supporting the growth of C. elegans larvae, whereas supplementation with vitamin B2 (VB2) can restore animal growth7

      This study investigates the role of sugar deficiency in HK-E. coli, which induces the UPRER-immune response and avoidance behavior in C. elegans. Surprisingly, our findings indicate that supplementing HK-E. coli with carbohydrates such as D-Glc and D-GlcA does not promote animal development (Figure 3-figure supplement 2G), suggesting that carbohydrates are not essential for supporting animal growth on this food source. However, we did observe that carbohydrates play a critical role in inhibiting the UPRER-immune response induced by sugar deficiency in HK-E. coli.

      -The authors claim that the induction of the innate immune pathway reporter irg-5::GFP is "abolished" in pmk-1(RNAi) animals, but Figure S2K seems to show a clear GFP signal when these animals are fed HK-OP50. Similarly, the claim that feeding WT animals HK-OP50 enriches phospho-PMK-1 levels (Fig 2E) is unconvincing - only one western blot is shown, with no quantification, and there is a smear in the critical first lane.

      (1) The authors claim that the induction of the innate immune pathway reporter irg-5::GFP is "abolished" in pmk-1(RNAi) animals, but Figure S2K seems to show a clear GFP signal when these animals are fed HK-OP50. 

      We sincerely appreciate the reviewer's attention. To address this concern, we have replaced the images with higher resolution, larger ones in Figure 2-figure supplement 1-I. These updated images provide a clearer representation of the data, ensuring that all details are readily visible and enabling a more accurate interpretation of the results.

      (2) Similarly, the claim that feeding WT animals HK-OP50 enriches phospho-PMK-1 levels (Fig 2E) is unconvincing - only one western blot is shown, with no quantification, and there is a smear in the critical first lane.

      Thank you, following reviewer’s suggestion, we also repeated some of the western. We now replace the Figure 2E and quantified relative intensity of pPMK-1/tublin. We also provide the uncropped western blots images as source data ( “raw-data WB” file). 

      -The rationales for some of the paper's hypotheses could be improved. For example, the rationale for screening the E. coli mutant library is that some mutants, when heat-killed, may be missing a metabolite that induces the ER-UPR. A more straightforward hypothesis might be that some mutant E. coli strains aberrantly induce the ER-UPR when *not* heat-killed, because they are missing a metabolite that prevents stress pathway induction. This is not in itself a major concern, but it would be useful for the authors to provide a rationale for their hypothesis.

      Thank you for the insightful suggestion. We acknowledge the importance of providing a clear rationale for our hypotheses in the paper. In response to this feedback, we have enhanced the discussion section to better elucidate the rationale behind our hypotheses.

      One limitation of our study is the lack of explanation for why HK-E. coli activates UPRER and immunity. We hypothesized that when heat-killed, HK-E. coli may lack or contain altered levels of certain metabolites that either activate or inhibit UPRER and immunity, respectively. Additionally, we speculated that E. coli mutants killed by heat may lack metabolites that activate UPRER and immunity, or conversely, have increased levels of metabolites that inhibit these pathways.

      Fortunately, our investigation led to the discovery of the E. coli mutant yfbR, which inhibits UPRER and immunity by increasing carbohydrates that aid in resisting these stress pathways. Moving forward, we intend to further explore the intricate relationship between HK-E. coli and UPRER-immunity. This will be a key focus of our future research efforts.

      -The authors do not provide any explanation for some unexpected results from the E. coli screen. Earlier in the paper, the authors found that innate immune signaling is downstream of ER-UPR activation. However, of the 20 E. coli mutants that, when heat-killed, "did not induce... the UPR-ER reporter," 9 of them still activate the innate immune response. This seems at odds with the authors' simple model since it suggests that low-quality food can induce innate immune signaling independently of the ER-UPR. Further, only one of the 9 has an effect on behavior, even though failure to activate the innate immune pathway might be expected to lead to a behavioral defect in all of these.

      Thank you for your understanding, and we apologize for any confusion caused by our earlier statement. To provide clarification, our study revealed that out of the 20 E. coli mutants examined, none activated the UPRER. Among these mutants, 9 did not induce immunity, and interestingly, one out of these 9 mutants demonstrated the ability to inhibit avoidance behavior.

      This diversity in phenotypic outcomes can be attributed to the varied metabolites present in different E. coli mutants. To thoroughly evaluate the effects of these mutants, we conducted a comprehensive three-step screening process, utilizing UPRER marker, immunity marker, and avoidance behavior assays.

      Through this rigorous approach, we identified the E. coli mutant, yfbR, which exhibited the desired inhibitory effects on UPRER, immunity, and avoidance behavior.

      Subsequently, we conducted a metabolomics analysis of various food qualities (HK-K12, HK-yfbR, and Live-K12). Our findings revealed higher sugar levels in

      HK-yfbR and Live-K12 compared to HK-K12 (Figure 3B, Figure 3-figure supplement 2A, and Table S1), indicating that sugar deficiency might trigger the UPRER, immunity responses, and subsequent avoidance behavior. 

      -In a number of places, the writing style can make the authors' arguments difficult to follow.

      Thanks for the reviewer’s efforts. We changed all of these errors and polish the language of this paper. 

      -Some of the effect sizes observed by the authors are exceedingly small (e.g, the suppression of hsp-4::gfp induction by sugar supplementation in Figs 3C-E), raising some concern about the biological significance of the effect.

      Thank you for your feedback. In response to your concern, we have included additional clarification in the manuscript.

      We have added the following statement: “While sugar effectively inhibits the HK-E. coli-induced UPRER and immune response, it does not fully suppress it to the extent observed with live-E. coli (Figure 3C-F). This implies that additional nutrients present in live-E. coli might also contribute to the inhibition of UPRER and immune response.”

      This addition helps to address the observation that some effect sizes appear small, providing context and suggesting potential factors that may influence the outcomes. 

      -In some cases, there is a discrepancy between the fluorescence images and their quantitation (e.g., Figure 3E, where the effect of glucose on GFP fluorescence seems much stronger in the image than in the graph).

      Thank you for your valuable suggestion. In response, we have revised our image selection process to ensure impartiality. We now randomly select images to ensure they accurately represent the quantified data without bias. More details regarding this update can be found in Author response image 2.

      Author response image 2.

      More original picture corresponding to Figure 3E 

      Reviewer #3 (Public Review):

      Summary:

      Animals can evaluate food quality in many ways. In contrast to the rapid sensory evaluation with smell and taste, the mechanism of slow nutrient sensation and its impact on food choice is unexplored. The authors utilize C. elegans larvae and their bacterial food as an elegant model to tackle this question and reveal the detailed molecular mechanism to avoid nutrient-poor foods.

      Strengths:

      The strength of this study is that they identified the molecular identities of the critical players in bacterial food and C. elegans using unbiased approaches, namely metabolome analysis, E. coli mutant screening, and RNA sequencing. Furthermore, they strengthen their findings by thorough experiments combining multiple methods such as genetics, fluorescent reporter analysis, and Western blot.

      Thank you for highlighting the strengths of our study. 

      Weaknesses:

      The major caveat of this study is the reporter genes. The transcriptional reporters were used to monitor the UPRER and immune responses in the intestine of C. elegans.

      However, their tissue-specific rescue experiments suggest that the genes in the UPRER and immune response function in the neurons. Thus, we should carefully interpret the results of the reporter genes.

      Thank you for your insightful comment. We appreciate the opportunity to address your concerns regarding the interpretation of our reporter gene data.

      Upon reevaluation, we observed strong induction of the UPRER reporter

      (Phsp-4::GFP)8 and immunity reporter (Pirg-5::GFP)9 both in the intestine (Figure 1F-G) and in neurons (Figure 1-figure supplement 2A) in response to feeding unfavorable food (HK-E. coli). This suggests that both the UPRER and immune pathways may indeed respond to low-quality food (HK-E. coli) in multiple tissues of C. elegans. While we acknowledge that our tissue-specific rescue experiments suggest a role for these pathways in neurons, the intestinal fluorescence of Phsp-4::GFP or Pirg-5::GFP is easily observable and scorable. Therefore, we chose to focus our further analyses on the intestine for practical reasons.

      Overall, this work provides convincing data to support their model. In the C. elegans field, the behaviors of larvae are not well studied compared to adults. This work will pose an interesting question about the difference between larvae and adults in nutrition sensing in C. elegans and provide a framework and candidate molecules to be studied in other organisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major suggestions:

      (1) My major overall comment is that the paper would be substantially strengthened by more mechanistic analysis. In particular, how does low vitamin C trigger UPR activation? This is an intriguing finding and it would be important to see it more fully explored.  

      Our study revealed that the vitamin C content in HK_-E. coli_ is comparable to that of live E. coli or HK-yfbR (Figure 4-figure supplement 1A), suggesting that the induction of unfolded protein response (UPR) in C. elegans by HK-E. coli is not attributed to low vitamin C levels, but rather to unknown factors.

      Metabolomic analysis showed that the sugar levels, including lactose, D-(+)-sucrose, and D-(+)-glucose, were significantly decreased in HK-E. coli (Figure 3B, Table S1).

      Furthermore, we found that supplementing D-(+)-glucose effectively inhibited UPRER (Figure 3E), immune response (Figure 3F, 3G, and Figure 3-figure supplement 2D), and avoidance behavior (Figure 3H) induced by HK-E. coli. Our findings suggest that the deficiency in sugars in HK-E. coli triggers a stress response and avoidance behavior in animals, which can be alleviated by D-(+)-glucose supplementation.

      Notably, when E. coli was heat-killed, we observed that the sugar levels, including lactose and D-(+)-sucrose, were significantly higher in the heat-killed E. coli mutant yfbR (HK-yfbR) compared to HK-E. coli (Figure 3B). Moreover, we found that UPRER was reduced in animals feeding HK-yfbR (Figure 3-figure supplement 1B), indicating that higher sugar levels may inhibit the induction of UPRER by low-quality food.

      The synthesis of vitamin C (VC) occurs through the glucuronate pathway, utilizing D-glucose as a precursor 1, 2 (Figure 4A). This led us to investigate whether the vitamin C biosynthesis pathway is involved in evaluating low-quality food by using D-glucose. In this study, we found that animals feeding live E. coli, which should produce more VC, exhibit higher glucose levels. However, our results show that animals maintain similar VC levels when fed ideal food (live E. coli) compared to low-quality food (HK-E. coli) (Figure 4B), suggesting that animals do not stimulate VC biosynthesis under favorable food conditions. In contrast, when animals are fed low-quality food (HK-OP50), we found that supplementing D-GlcA (Figure 4C) or E. coli-yfbR mutation (Figure 4B) in HK-E. coli can improve VC levels. Moreover, we found that VC or D-glucuronate (D-GlcA) supplementation inhibited HK-E. coli induced UPRER (Figure 4D). These data indicate that glucose boosts the animal's ability to adapt to unfavorable food environments by increasing VC levels, thereby inhibiting UPRER, but not in favorable food conditions.

      In addition,we asked whether high level of VC inhibits other stress induced UPRER. Previous study shown that Tunicamycin6 and pathogenic bacteria-Pseudomonas aeruginosa-PA145 induce UPRER in C. elegans. We found that VC supplementation does not inhibit Tunicamycin or PA14-induced URPER (Author response image 3). 

      Author response image 3.

      VC supplementation does not inhibit Tunicamycin or PA14-induced UPRER.

      In addition, how is the activation of the PMK-1 pathway driven by/coordinated with UPR activation? 

      If the authors do not want to pursue these directions experimentally in this study, the discussion would be strengthened by considering these questions and identifying candidate regulatory mechanisms for further exploration.

      In this study, we found that heat-killed E. coli (HK-E. coli), a low-sugar food, triggers cellular unfolded protein response (UPRER) and immune response. We also demonstrated that 1) the activation of UPRER by low-quality food depends on the IRE-1/XBP-1, 2) activation of immune response (PMK-1) is downstream of XBP-1 in responding to low-quality food.

      how is the activation of the PMK-1 pathway driven by/coordinated with UPR activation? 

      In our discussion part, we added new data and discussion to answer reviewer’s question. 

      A previous study has shown that activating innate immunity (PMK-1 MAPK) leads to a reduction in translation 3. Our own previous research has also demonstrated that PMK-1 activation causes a shutdown of food digestion in animals4, likely to reduce protein translation and cellular metabolism. To investigate this further, we measured the translation level of animals fed with HK-E. coli and found that total translation ability is significantly reduced in these animals (Figure 5-figure supplement 1D). This finding suggests that activating innate immunity (PMK-1 MAPK) may serve as a mechanism to slow down translation progress, thereby alleviating the pressure on the unfolded protein response (UPR) and preventing excessive UPRER activation.

      (2) Figure 2C: The data shows that xbp-1 mutants are significantly more likely to leave heat-killed E. coli. However, no other conditions are examined. Is this avoidance defect specific to heat-killed E. coli, or is it a more general effect of xbp-1 mutants - that is, are other conditions that evoke avoidance also affected by mutation of xbp-1? Is feeding behavior on regular E. coli altered in this background? The finding would be more relevant if the authors could clarify or provide more context for their claims here.

      We then asked whether UPRER (IRE-1/XBP-1) - Innate immunity (PMK-1/p38 MAPK) axis is specific to evaluate low-quality food (HK-E. coli). We examined the avoidance behavior phenotype of wild-type and mutant L1 animals by placing them on various food conditions, including normal E. coli food, inedible food (Saprophytic staphylococci) and pathogenic food (Pseudomonas aeruginosa-PA14), for a 24-hour period. We found that N2, pmk-1, and xbp-1 mutant worms did not exhibit avoidance behavior when presented with normal food (OP50). However, both N2 and xbp-1 mutant worms were able to escape from inedible food, Saprophytic staphylococci, whereas pmk-1 mutant worms did not show this avoidance. Notably, xbp-1 mutant worms exhibited even more pronounced avoidance behavior when exposed to Pseudomonas aeruginosa, whereas pmk-1 mutant worms were more susceptible to infection by this pathogen (Figure 2-figure supplement 2C). These findings suggest that the UPR-Immunity pathway plays a crucial role in helping animals avoid low-quality food by triggering an avoidance response. In contrast, the Innate immunity pathway, which is mediated by PMK-1/p38 MAPK, appears to play a key role in evaluating unfavorable food sources, such as HK-E. coli, Saprophytic staphylococci, and Pseudomonas aeruginosa, and helping animals avoid these environments.

      (3) Figure 3C-F: The magnitude of the changes between conditions shown in these panels is small. To what extent does this supplementation represent a full rescue? The findings would be strengthened if figures/images for the control condition (non-HK E. coli) were shown for comparison to allow the reader to assess the extent to which UPR/PMK-1 activation is rescued.

      In response to a reviewer's suggestion, we included live-E. coli as a control in our study. Notably, our data revealed that the addition of lactose, D-(+)-sucrose, and D-(+)-glucose partially inhibited the HK-E. coli-induced unfolded protein response (UPRER) and immune response, suggesting that other nutrients present in live-E. coli may also play a role in inhibiting UPRER.

      We added this in manuscript: “While sugar effectively inhibits the HK-E. coli-induced UPRER and immune response, it does not fully suppress it to the extent observed with live-E. coli (Figure 3C-F). This implies that additional nutrients present in live-E. coli might also contribute to the inhibition of UPRER and immune response.” 

      (4) Figure 5B-D: The magnitude of changes shown between conditions here again appear to be very small, even those labelled as statistically significant. It is important to ensure that the correct statistical tests have been used to assess the significance of these differences (see below).

      All statistical analyses were performed in Graphpad prism 8.0. Two-tailed unpaired t test was used for statistical analysis of two groups of samples,one-way or two-way ANOVA was used for statistical analysis of more than two groups of samples.

      (5) Methods: In the "Statistical analysis" section, the authors state that "All statistical analyses were performed using Student's t-test". However, this is not the appropriate test to use in experiments where multiple comparisons are made, which is true in several instances across the paper. In these cases, a more appropriate statistical test should be used.

      All statistical analyses were performed in Graphpad prism 8.0. Two-tailed unpaired t test was used for statistical analysis of two groups of samples,one-way or two-way ANOVA was used for statistical analysis of more than two groups of samples.

      Minor suggestions:

      (1) Figure S2: RNAi is usually delivered in a different E. coli strain, HT115. Is this the case with the RNAi knockdowns in Figure S2, and given that diet can influence UPR activation, is it possible that this different diet could change the phenotypes observed?

      This should be clarified by the authors.

      In this study, all RNAi experiments involved bleaching adult animals under RNAi strain culture conditions to obtain L1 animals. Subsequently, L1 animals were transferred to HK-E. coli OP50 for phenotype analysis. In response to a reviewer's suggestion, we observed that L1 animals obtained from mothers fed E. coli strains OP50, HT115, or K12 exhibited similar UPR induction under HK-E. coli OP50 feeding conditions (Author response image 4). These findings suggest that variations in diet did not alter the UPR phenotypes.

      Author response image 4.

      L1 animals obtained from mothers fed E. coli strains OP50, HT115, or K12 exhibited similar UPR induction under HK-E. coli OP50 feeding conditions 

      Reviewer #2 (Recommendations For The Authors):

      Line 182: "irg-5::GFP" should be "hsp-4::gfp".

      Thanks for the reviewer’s efforts. We have changed this error.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      (1) The reporter genes of UPRER and immune response were analyzed in the intestine throughout the study. On the other hand, their rescue experiments suggest that these pathways function in the neurons. They should provide the fluorescence data in the neurons at least for Figures 1F and 1G to confirm that the intestinal response matches the neuronal response and mention that further analyses were done in the intestine for easy scoring.

      Consistent with the results of the RNA sequencing (RNA-seq) analysis, the UPRER reporter (Phsp-4::GFP)8 and immunity reporter (Pirg-5::GFP)9 were strongly induced in intestinal (Figure 1F-G) and neurons (Figure 1-figure supplement 2A) by feeding unfavorable food (HK-E. coli), suggesting that UPRER and immune pathways may respond to low-quality food (HK-E. coli). As intestinal fluorescence (Phsp-4::GFP or Pirg-5::GFP) is easy observation and scoring, the further analyses were done in the intestine. 

      (2) I have concerns about the interpretation of the p-PMK-1 data. Although the authors described that "p-PMK-1 is prominently increased" in the text (Line 150), it is unclear on the data (Figure 2E). Similarly, the authors' statement "p-PMK-1 is decreased in animals with D-GlcA (F).." was not fully supported by the data in Figure 4F. The experiment should be repeated and quantified. Moreover, pPMK-1 showed single bands in Figure 2E, but double bands in Figure 3G, 4F, and 4G. The authors should explain why that is the case and which band we should look at for Figures 3G, 4F, and 4G.

      As reviewer’s suggestion, we also repeated some of the western. We found that after longer expose, there are two bands for pPMK-1 (Figure 2E, new data; and “raw-data WB” file). The VHP-1 phosphatase is known to inhibit PMK-13. In our previous study, we found that worms treated with vhp-1(RNAi), which hyperactivates p-PMK-1 (lower band) 4. In contrast, the two bands are disappeared in pmk-1 mutant (Author response image 5). Thus, the lower band indicates the pPMK-1. We now replace the Figure 2E and quantified relative intensity of pPMK-1/tublin. We also provide the uncropped western blots images as source data ( “raw-data WB” file). 

      Author response image 5.

      In our previous study, we found that worms treated with vhp-1(RNAi), which hyperactivates p-PMK-1 (lower band) 4. In contrast, the two bands are disappeared in pmk-1 mutant. These pictures are extracted from our previous study4.

      (3) Heat-killed E. coli (HK-E. coli) is low-quality because the lack of sugar cannot support the growth of C. elegans larvae (Qi and Han, Cell, 2018). Thus, animals do not show the UPRER-immune response and avoidance when HK-E. coli is supplemented with sugars such as glucose (Line 225-227). If these sugars are the key, C. elegans larvae should be able to grow better with HK-E. coli supplemented with glucose. Authors should address this possibility.

      Previous studies have shown that heat-killed E. coli (HK-E. coli) is a low-quality food source that cannot support the growth of C. elegans larvae7. Here, we found that sugar deficiency in HK-E. coli induces the UPRER-immune response and avoidance behavior in C. elegans. Given this, we investigated whether sugar supplementation could promote animal growth when fed HK-E. coli. To our surprise, supplementing HK-E. coli with carbohydrates (D-Glc, D-GlcA) did not support animal development (Figure 3-figure supplement 2G), suggesting that carbohydrates are not essential for supporting animal growth on this food source. However, we did find that carbohydrates are critical for inhibiting the UPRER-immune response induced by sugar deficiency in HK-E. coli.

      (4) Line 884: Instead of the Student's t-test, the ANOVA should be used for multiple comparisons.

      All statistical analyses were performed in Graphpad prism 8.0. Two-tailed unpaired t test was used for statistical analysis of two groups of samples,one-way or two-way ANOVA was used for statistical analysis of more than two groups of samples.

      (5) Although the results are interesting and convincing, the manuscript needs some careful editing and proofreading. As far as I could catch, there are more than 100 errors and typos, as I summarized in minor comments. I recommend the authors proofread thoroughly to make this work easier to read.

      Thanks for the reviewer’s efforts. We changed all of these errors and polish the language of this paper. 

      Minor comments:

      (1) Line 30: nature -> natural

      (2) Line 86: elegnas -> elegans

      (3) Line 93: the17h -> the 17h

      (4) Line 97: response -> respond

      (5) Line106: responded -> respond

      (6) Lien 107-109: Add references for the three reporters

      (7) Line 114: immune -> immune pathway

      (8) Line 118: immune depended -> immune-dependent

      (9) Line 128, 594, 596: deferentially -> differentially

      (10) Line 131: Explain what IRE-1-mediated splicing of xbp-1 with references

      (11) Line 170: XPB-1 -> XBP-1

      (12) Line 179: URP -> UPR

      (13) Line 181: hsp-4::GFP -> Phsp-4::GFP

      (14) Line 183: Italicize E. coli; mutant -> mutants

      (15) Line 184: irg-5::GFP -> Pirg-5::GFP (2 places)

      (16) Line 197, 203, 206, 207: Lactose -> lactose

      (17) Line 206, 209, 217, 225, 228, 232, 237, 262, 442, 445, 604, 739: Glucose -> glucose

      (18) Line 218: Sugars deficiency -> sugar deficiency

      (19) Line 229: found contribute to -> found to contribute to

      (20) Line 235, 537, 539, 587, 599, 642, 855: Italicize E. coli

      (21) Line 236: same -> the same

      (22) Line 239: I recommend adding "in C. elegans". This study uses both E. coli and C.

      elegans genetics. Sometimes, it is confusing which organism was mentioned. It should be applied where it is necessary.

      (23) Line 240: additional -> addition

      (24) Line 339, 642: Italicize kgb-1

      (25) Line 390: Italicize Pseudomonas aeruginosa, Bacillus thuringiensis,

      Staphylococcus aureus, and Serratia marcescens

      (26) Line 394: wiht -> with

      (27) Line 400, 550: Change ER to superscript; Italicize ire-1, xbp-1, and pmk-1

      (28) Line 415: xpb-1 -> xbp-1

      (29) Line 460, 525, 531, 532, 617, 655: Italicize yfbR

      (30) Line 457, 468, 472, 475, 482, 497, 513, 624, 629, 633, 733. 758: Vitamin -> vitamin

      (31) Line 459: Make it clear what is the relationship between vitamin C and TAA

      (32) Line 527: Do not italicize mutant

      (33) Line 538: Phsp-6:GFP -> Phsp-6::GFP (to match other descriptions)

      (34) Line 540: Phsp-4:GFP -> Phsp-4::GFP (to match other descriptions)

      (35) Line 540: Italicize hsp-4

      (36) Line 543: Pirg-5:GFP -> Pirg-5::GFP (to match other descriptions) and italicize irg-5

      (37) Line 550, 881: Innate -> innate

      (38) Line 557, 560, 564, 838: Do not italicize HK

      (39) Line 561: Remove the extra space before "three"

      (40) Line 575, 577: Reporter -> reporter

      (41) Line 575, 607: Italicize Phsp-4::GFP

      (42) Line 577: immunity -> Immunity; Italicize Pirg-5::GFP

      (43) Line 585, 653: keio -> Keio

      (44) Line 586: hsp-4::GFP -> Phsp-4::GFP

      (45) Line 586, 589 (2 places): irg-5::GFP -> Pirg-5::GFP

      (46) Line 597: Remove "all"

      (47) Line 600: Trehalose -> trehalose

      (48) Line 609: Italicize Pirg-5::GFP

      (49) Line 615: critically -> critical

      (50) Line 636: Remove "+"

      (51) Line 656 (2 places), 682: Do not italicize OP50

      (52) Line 664: Lead -> lead

      (53) Line 681: Describe the composition of NGM or show the reference. Since this paper examines nutrition, the composition of the medium is crucial.

      (54) Line 686-706: Italicize all allele names. Be consistent with how to write the promoter to avoid confusion (e.g., ttx-3p -> Pttx-3). Be consistent with how to describe the transgene (e.g., Phsp-4::GFP(zcIs4) -> zcIs4[Phsp-4::GFP])

      (55) Line 710: Describe the composition of LB or show the reference. Since this paper examines nutrition, the composition of the medium is crucial.

      (56) Line 709, 856 (2 places), 858: Do not italicize K12 to make it consistent

      (57) Line 719: Podr-1p:RFP -> Podr-1::RFP

      (58) Line 722, 724: Italicize ges-1 and xbp-1

      (59) Line 723: Pges-1:xbp-1::GFP -> Pges-1::xbp-1::GFP

      (60) Line 735: Glucuronic -> glucuronic

      (61) Line 748: I believe it is 5 mm instead of 0.5 mm

      (62) Line 750: The equation should be (5 mm)2/(17.5 mm)2

      (63) Line 759: Remove the period after "pattern".

      (64) Line 766: Describe how they were synchronized

      (65) Line 774: Italicize Psysm-1p::GFP

      (66) Line 785: Insert a space before "until"

      (67) Line 787: the mutant -> mutant

      (68) Line 789, 792, 793, 795 (2 places): GPF -> GFP

      (69) Line 791: next -> Next; an -> a

      (70) Line 799: Remove a space before "MRC".

      (71) Line 804: I do not understand what "until adulthood" means in this context;

      Remove a space before "by". (I recommend searching double space and correcting it.)

      (72) Line 853: Metabolome -> metabolome

      (73) Line 893-1082: Species and gene names should be italicized in Reference

      (74) Figures 1F, 1G, S2F, S2G: The panels' order should match the bar graphs' order. The apparent difference in the representative data does not match the marginal difference in the bar graph in Fig. 1G. The authors should double-check the results.

      (75) Figure 1F, 2A, 2B, 3C, 3D, 3E, 4D, 4I, S1J, S2A, S2B, S2I, S3B, S3F, S3H: hsp-4::GFP -> Phsp-4::GFP

      (76)  Figure 1G, 2D, 3F, 4E, 4J, S1K, S2H, S3C, S3I: irg-5::GFP -> Pirg-5::GFP

      (77)  Figure 6: Liquids -> Lipids; Italicize ire-1, xbp-1, pmk-1

      (78)  Figure S1I: hsp-6::GFP -> Phsp-6::GFP

      (79)  In the legend for Figure S1 after Figure S1, (A), (B)... were duplicated. It is OK in the corresponding main text (Line 530)

      (80)  Figure S2F, S3G, S4C, S4D: sysm-1::GFP -> Psysm-1::GFP

      (81)  Figure S2G: irg-1::GFP -> Pirg-1::GFP

      (82)  Figure S3H and S3I: Describe which ones are Glu + conditions

      References: 

      (1) Patananan AN, Budenholzer LM, Pedraza ME, Torres ER, Adler LN, Clarke SG. The invertebrate Caenorhabditis elegans biosynthesizes ascorbate. Arch Biochem Biophys 569, 32-44 (2015).

      (2) Yabuta Y_, et al. L-Ascorbate Biosynthesis Involves Carbon Skeleton Rearrangement in the Nematode Caenorhabditis elegans. _Metabolites 10,  (2020).

      (3) Weaver BP, Weaver YM, Omi S, Yuan W, Ewbank JJ, Han M. Non-Canonical Caspase Activity Antagonizes p38 MAPK Stress-Priming Function to Support Development. Dev Cell 53, 358-369 e356 (2020).

      (4) Geng S_, et al. Gut commensal E. coli outer membrane proteins activate the host food digestive system through neural-immune communication. _Cell Host Microbe 30, 1401-1416 e1408 (2022).

      (5)  Richardson CE, Kooistra T, Kim DH. An essential role for XBP-1 in host protection against immune activation in C. elegans. Nature 463, 1092-1095 (2010).

      (6) Harding HP_, et al. An Integrated Stress Response Regulates Amino Acid Metabolism and Resistance to Oxidative Stress. _Molecular Cell 11, 619-633 (2003).

      (7) Qi B, Kniazeva M, Han M. A vitamin-B2-sensing mechanism that regulates gut protease activity to impact animal’s food behavior and growth. eLife 6, e26243 (2017).

      (8) Calfon M_, et al. IRE1 couples endoplasmic reticulum load to secretory capacity by processing the XBP-1 mRNA. _Nature 415, 92-96 (2002).

      (9) Bolz DD, Tenor JL, Aballay A. A Conserved PMK-1/p38 MAPK Is Required in Caenorhabditis elegans Tissue-specific Immune Response to Yersinia pestis Infection*. The Journal of Biological Chemistry 285, 10832 - 10840 (2010).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Ritvo and colleagues present an impressive suite of simulations that can account for three findings of differentiation in the literature. This is important because differentiation-in which items that have some features in common, or share a common associate are less similar to one another than are unrelated items-is difficult to explain with classic supervised learning models, as these predict the opposite (i.e., an increase in similarity). A few of their key findings are that differentiation requires a high learning rate and low inhibitory oscillations, and is virtually always asymmetric in nature.

      This paper was very clear and thoughtful-an absolute joy to read. The model is simple and elegant, and powerful enough to re-create many aspects of existing differentiation findings. The interrogation of the model and presentation of the findings were both extremely thorough. The potential for this model to be used to drive future work is huge. I have only a few comments for the authors, all of which are relatively minor.

      (1) I was struck by the fact that the "zone" of repulsion is quite narrow, compared with the zone of attraction. This was most notable in the modeling of Chanales et al. (i.e., just one of the six similarity levels yielded differentiation). Do the authors think this is a generalizable property of the model or phenomenon, or something idiosyncratic to do with the current investigation? It seems curious that differentiation findings (e.g., in hippocampus) are so robustly observed in the literature despite the mechanism seemingly requiring a very particular set of circumstances. I wonder if the authors could speculate on this point a bit-for example, might the differentiation zone be wider when competitor "pop up" is low (i.e., low inhibitory oscillations), which could help explain why it's often observed in hippocampus? This seems related a bit to the question about what makes something "moderately" active, or how could one ensure "moderate" activation if they were, say, designing an experiment looking at differentiation.

      We thank the reviewer for this comment. In the previous version of the manuscript, in the section entitled “Differentiation Requires a High Learning Rate and Is Sensitive to Activation Dynamics”, we discussed some reasons why differentiation may be more likely to be found in the hippocampus – namely, the high learning rate of the hippocampus and the sparsity of hippocampal activation patterns (pp. 27-28):

      “These results have implications for where to look for differentiation in the brain. Our finding that differentiation requires a high learning rate suggests that differentiation will be more evident in the hippocampus than in neocortex, insofar as hippocampus is thought to have a higher learning rate than neocortex (McClelland et al., 1995). In keeping with this prediction, numerous studies have found differentiation effects in hippocampus but not in neocortical regions involved in sensory processing (e.g., Chanales et al., 2017; Favila et al., 2016; Zeithamova et al., 2018). At the same time, some studies have found differentiation effects in neocortex (e.g., Schlichting et al., 2015; Wammes et al., 2022). One possible explanation of these neocortical differentiation effects is that they are being ``propped up’’ by top-down feedback from differentiated representations in the hippocampus. This explanation implies that disruptions of hippocampal processing (e.g., lesions, stimulation) will eliminate these neocortical differentiation effects; we plan to test this prediction in future work.

      Additionally, the simulations where we adjusted the oscillation amount (using our model of Schlichting et al., 2015) imply that differentiation will be most evident in brain regions where it is relatively hard to activate competitors. Given the U shape of the NMPH learning rule, limiting competitor activity makes it less likely that plasticity will ``cross over'' from weakening (and differentiation) to strengthening (and integration). Thus, within the hippocampus, subregions with sparser activity (e.g., dentate gyrus, and to a lesser extent, CA3; Barnes et al., 1990, GoodSmith et al., 2017; West et al., 1991) will be more prone to differentiation. There is strong empirical support for this prediction. For example, Wammes et al. (2022) manipulated the similarity of stimuli in a statistical learning experiment and found that moderate levels of visual similarity were associated with significant differentiation in the dentate gyrus but not other subregions. Also, numerous studies have found greater differentiation in dentate gyrus / CA3 than in CA1 (e.g., Dimsdale-Zucker et al., 2018; Wanjia et al., 2021; Molitor et al., 2021; Kim et al., 2017; but see Zheng et al., 2021).”

      In the revised draft we have supplemented this discussion with a new section entitled “Reconciling the Prevalence of Differentiation in the Model and in the Data” (pp. 30-31):

      “A key lesson from our model is that, from a computational perspective, it is challenging to obtain differentiation effects: The region of parameter space that gives rise to differentiation is much smaller than the one that gives rise to integration (for further discussion of this issue, see the section in Methods on Practical Advice for Getting the Model to Show Differentiation). However, the fact that integration is more prevalent in our simulations across parameter configurations does not mean that integration will be more prevalent than differentiation in real-life circumstances. What really matters in predicting the prevalence of differentiation in real life is how the parameters of the brain map on to parameters of the model: If the parameters of the brain align with regions of model parameter space that give rise to differentiation (even if these regions are small), this would explain why differentiation has been so robustly observed in extant studies. Indeed, this is exactly the case that we sought to make above about the hippocampus – i.e., that its use of especially sparse coding and a high learning rate will give rise to the kinds of neural dynamics that cause differentiation (as opposed to integration). As another example, while it is true that half of the overlap conditions in our simulation of Chanales et al. (2021) give rise to integration, this does not imply that integration will occur half of the time in the Chanales et al. (2021) study; it may be that the levels of overlap that are actually observed in the brain in Chanales et al. (2021) are more in line with the levels of overlap that give rise to differentiation in our model.”

      (2) With real fMRI data we know that the actual correlation value doesn't matter all that much, and anti-correlations can be induced by things like preprocessing decisions. I am wondering if the important criterion in the model is that the correlations (e.g., as shown in Figure 6) go down from pre to post, versus that they are negative in sign during the post learning period. I would think that here, similar to in neural data, a decrease in correlation would be sufficient to conclude differentiation, but would love the authors' thoughts on that.

      We thank the reviewer for bringing this up. In the paper, we define differentiation as the moving apart of representations – so we agree with the reviewer that it would be appropriate to conclude that differentiation is taking place when correlations go down from pre to post.

      In addition to the definitional question (“what counts as differentiation”), one can also ask the mechanistic question of what is happening in the model at the (simulated) neuronal level in conditions where differentiation (i.e., an average decrease in similarity from pre to post) occurs. Here, the model’s answer is clear: When the similarity of two pairmates decreases, it is because the pairmates have acquired anticorrelated representations at the (simulated) neuronal level. When similarity decreases on average from pre to post, but the average “post” similarity value is not negative, this is because there is a mix of outcomes across runs of the model (due to variance in the initial, random model weights and also variance in the order in which items are presented across training epochs) – some runs lead to differentiation (manifested as anticorrelated pairmate representations) whereas others lead to no change or integration. The average pre-to-post change depends on the relative frequencies with which these different outcomes occur.

      We have made several edits to the paper to clarify this point.

      We added a new section under “Results” in our simulation of Chanales et al. (2021) entitled, “Pairs of Items that Differentiate Show Anticorrelated Representations” (p. 15):

      “Figure 6B also highlights that, for learning rates where robust differentiation effects occur in aggregate (i.e., there is a reduction in mean pattern similarity, averaging across model runs), these aggregate effects involve a bimodal distribution across model runs: For some model runs, learning processes give rise to anticorrelated representations, and for other model runs the model shows integration; this variance across model runs is attributable to random differences in the initial weight configuration of the model. The aggregate differentiation effect is therefore a function of the proportion of model runs showing differentiation (here, anticorrelation) and the proportion of model runs showing integration. The fact that differentiation shows up as anticorrelation in the model's hidden layer relates to the learning effects discussed earlier:

      Unique competitor units are sheared away from (formerly) shared units, so the competitor ends up not having any overlap with the target representation (i.e., the level of overlap is less than you would expect due to chance, which mathematically translates into anticorrelation). We return to this point and discuss how to test for anticorrelation in the Discussion section.”

      We added new text to the “Take-Home Lessons” section in the Chanales et al. (2021) simulation (p. 17):

      “In particular, the simulations expose some important boundary conditions for when representational change can occur according to the NMPH (e.g., that differentiation depends on a large learning rate, but integration does not), and the simulations provide a more nuanced account of exactly how representations change (e.g., that differentiation driven by the NMPH is always asymmetric, whereas integration is sometimes asymmetric and sometimes symmetric; and that, when differentiation occurs on a particular model run, it tends to give rise to anticorrelated representations in the model's hidden layer).”

      We added new text to the “Nature of Representational Change” section in the Favila et al. (2016) simulation (p. 21):

      “Figure 8 - Supplement 1 also indicates that, as in our simulation of Chanales et al. (2021), individual model runs where differentiation occurs show anticorrelation between the pairmate representations, and gradations in the aggregate level of differentiation that is observed across conditions reflect differences in the proportion of trials showing this anticorrelation effect.”

      We added new text to the “Take-Home Lessons” section in the Favila et al. (2016) simulation (p.21):

      “As in our simulation of \cite{chanales2021adaptive}, we found that the NMPH-mediated differentiation was asymmetric, manifested as anticorrelation between pairmate representations on individual model runs, and required a high learning rate, leading to abrupt representational change.”

      We added new text to the “Nature of Representational Change” section in the Schlichting et al. (2015) simulation (p. 26):

      “Also, as in our other simulations, when differentiation occurs on a particular model run it tends to give rise to anticorrelated representations (results not shown).”

      We added new text to the “Take-Home Lessons” section in the Schlichting et al. (2015) simulation (pp. 26-27):

      “As in the other versions of our model, differentiation requires a high learning rate, and – on model runs when it occurs – it is asymmetric and gives rise to anticorrelated representations.”

      We added new text at the start of the Discussion (p. 27):

      “In addition to qualitatively replicating the results from the studies we simulated, our model gives rise to several novel predictions – most notably, that differentiation driven by the NMPH requires a rapid learning rate and, when it occurs for a particular pair of items, it is asymmetric and gives rise to anticorrelated representations.”

      We also added a new section in the Discussion entitled “Testing the Model's Prediction about Anticorrelation”, which (among other things) highlights the reviewer’s point that fMRI pattern similarity values can be affected by preprocessing choices (p. 30):

      “Even though we operationally define differentiation as a reduction in similarity with learning, the way that it actually shows up on individual model runs is as anticorrelation between pairmates; in the model, the size of the aggregate differentiation effect is determined by the proportion of model runs that show this anticorrelation effect (vs. no change or integration). This implies that, if we could get a clean measurement of the similarity of pairmates in an experiment, we might see a multimodal distribution, with some pairmates showing anticorrelation, and others showing increased correlation (integration) or no change in similarity. This kind of clean readout of the similarity of individual pairs might be difficult to obtain with fMRI; it is more feasible that this could be obtained with electrophysiology. Another challenge with using fMRI to test this prediction is that anticorrelation at the individual-neuron level might not scale up to yield anticorrelation at the level of the BOLD response; also, fMRI pattern similarity values can be strongly affected by preprocessing choices – so a negative pattern similarity value does not necessarily reflect anticorrelation at the individual-neuron level. A final caveat is that, while we predict that differentiation will show up as anticorrelation in the brain region that gives rise to the differentiation effect, this might not translate into anticorrelation in areas that are downstream of this region (e.g., if the hippocampus is the source of the differentiation effect, we would expect anticorrelation there, but not necessarily in neocortical regions that receive input from the hippocampus; we revisit this point later in the discussion, when we address limitations and open questions).”

      We added new text in the Discussion, under “Limitations and Open Questions” (p. 31):

      “Importantly, while hippocampus can boost the representation of unique features in neocortex, we expect that neocortex will continue to represent shared perceptual features (e.g., in Favila et al., 2016, the fact that both pairmates are photos of barns). For this reason, in paradigms like the one used by Favila et al. (2016), the predicted effect of hippocampal differentiation on neocortical representations will be a reduction in pattern similarity (due to upregulation in the representation of unique pairmate features) but neocortex should not cross over into anticorrelation in these paradigms (due to its continued representation of shared perceptual features). Indeed, this is exactly the pattern that Wanjia et al. (2021) observed in their study, which used similar stimuli to those used in Favila et al. (2016).”

      Lastly, we updated the Abstract (p. 1)

      “What determines when neural representations of memories move together (integrate) or apart (differentiate)? Classic supervised learning models posit that, when two stimuli predict similar outcomes, their representations should integrate. However, these models have recently been challenged by studies showing that pairing two stimuli with a shared associate can sometimes cause differentiation, depending on the parameters of the study and the brain region being examined. Here, we provide a purely unsupervised neural network model that can explain these and other related findings. The model can exhibit integration or differentiation depending on the amount of activity allowed to spread to competitors – inactive memories are not modified, connections to moderately active competitors are weakened (leading to differentiation), and connections to highly active competitors are strengthened (leading to integration). The model also makes several novel predictions – most importantly, that when differentiation occurs as a result of this unsupervised learning mechanism, it will be rapid and asymmetric, and it will give rise to anticorrelated representations in the region of the brain that is the source of the differentiation. Overall, these modeling results provide a computational explanation for a diverse set of seemingly contradictory empirical findings in the memory literature, as well as new insights into the dynamics at play during learning.”

      (3) For the modeling of the Favila et al. study, the authors state that a high learning rate is required for differentiation of the same-face pairs. This made me wonder what happens in the low learning rate simulations. Does integration occur?

      For the same-face condition of the Favila simulation, lowering learning rate does not result in an overall integration effect:

      Author response image 1.

      In other cases, we do see integration emerge at lower learning rates – e.g., in the Schlichting interleaved condition we see a small integration effect emerge for a learning rate value of 0.3:

      Author response image 2.

      Our view is that, while integration can emerge at low learning rates, it is not a reliable property of the model – in some cases, there is a “window” of learning rates where there is enough learning to drive integration but not enough to drive differentiation, and in other cases there is not. Given this lack of reliability across simulations, we would prefer not to discuss this in the paper.

      This paradigm has a lot of overlap with acquired equivalence, and so I am thinking about whether these are the sorts of small differences (e.g., same-category scenes and perhaps a high learning rate) that bias the system to differentiate instead of integrate.

      We agree that it would be very interesting to use the model to explore acquired equivalence and related phenomena, but we think it is out of scope of the current paper. We have added some text to the Discussion under “Limitations and Open Questions” (p. 32):

      “Another important future direction is to apply the model to a wider range of learning phenomena involving representational change – for example, acquired equivalence, which (like some of the studies modeled here) involves linking distinct stimuli to a shared associate (see, e.g., Honey and Hall, 1989; Shohamy and Wagner, 2008; Myers et al., 2003; Meeter et al., 2009; de Araujo Sanchez and Zeithamova, 2023). It is possible that some of these phenomena might be better explained by supervised learning, or a mixture of unsupervised and supervised learning, than by unsupervised learning alone.”

      (4) For the simulations of the Schlichting et al. study, the A and B appear to have overlap in the hidden layer based on Figure 9, despite there being no similarity between the A and B items in the study (in contrast to Favila et al., in which they were similar kinds of scenes, and Chanales et al., in which they were similar colors). Why was this decision made? Do the effects depend on some overlap within the hidden layer? (This doesn't seem to be explained in the paper that I saw though, so maybe just it's a visualization error?)

      Overlap in the pretrained hidden representations of A and B is not strictly necessary for these effects – it would be possible to reconfigure other parameters to get high levels of competition even if there were no overlap (e.g., by upregulating the strengths of connections from shared input features). Having said that, it is definitely true that overlap between the pretrained hidden representations boosts competition, and we think it is justified to posit this in the Schlichting simulation. We have now added an explanation for this in the paper (p. 23):

      “New text in Schlichting, “Knowledge Built into the Network”

      Matching the previous two simulations, we pretrained the weights so the hidden representations of the stimuli initially had 2/6 units in common. Even though the A and B stimuli used in the actual experiment did not have obvious feature overlap (they were randomly selected novel objects), it is important to note that the hidden layer is not simply a representation of the sensory features of the A and B stimuli; the hidden layer also receives input from the output layer, which represents the shared associate of A and B (X). We think that the presence of this shared associate justifies our use of initially-overlapping hidden representations.”

      (5) It seems as though there were no conditions under which the simulations produced differentiation in both the blocked and intermixed conditions, which Schlichting et al. observed in many regions (as the present authors note). Is there any way to reconcile this difference?

      We thank the reviewer for bringing this up. If we set the connection strength between X (in the output layer) and A (in the hidden layer) in the blocked condition to .9 instead of .999 (keeping this connection strength at .8 for the interleaved condition) and we set Osc to .0615, we observe differentiation in both conditions.

      Rather than replacing the original results in the paper, which would entail re-making the associated videos, etc., we have added a supplementary figure (Figure 10 - Supplement 1), which is included on p. 46.

      We also added the following to the Results section of the Schlichting simulation in the main text (p. 26):

      “Figure 10 - Supplement 1 shows results from an alternative parameterization where, in the low-oscillation-amplitude condition, differentiation is observed in both the blocked and interleaved conditions (mirroring results from Schlichting et al., 2015, who found differentiation in both conditions in several regions of interest, including parts of the hippocampus and medial prefrontal cortex).”

      (6) A general question about differentiation/repulsion and how it affects the hidden layer representation in the model: Is it the case that the representation is actually "shifted" or repelled over so it is no longer overlapping? Or do the shared connections just get pruned, such that the item that has more "movement" in representational space is represented by fewer units on the hidden layer (i.e., is reduced in size)? I think, if I understand correctly, that whether it gets shifted vs. reduce would depend on the strength of connections along the hidden layer, which would in turn depend on whether it represents some meaningful continuous dimension (like color) or not. But, if the connections within the hidden layer are relatively weak and it is the case that representations become reduced in size, would there be any anticipated consequences of this (e.g., cognitively/behaviorally)?

      The representations are shifted – this is discussed in the Chanales results section:

      “Because the activity ``set point'' for the hidden layer (determined by the kWTA algorithm) involves having 6 units active, and the unique parts of the competitor only take up 4 of these 6 units, this leaves room for activity to spread to additional units. Given the topographic projections in the output layer, the model is biased to ``pick up'' units that are adjacent in color space to the currently active units; because activity cannot flow easily from the competitor back to the target (as a result of the aforementioned severing of connections), it flows instead {\em away} from the target, activating two additional units, which are then incorporated into the competitor representation. This sequence of events (first a severing of the shared units, then a shift away from the target) completes the process of neural differentiation, and is what leads to the behavioral repulsion effect in color recall (because the center-of-mass of the color representation has now shifted away from the target).”

      Reviewer #2 (Public Review):

      This paper addresses an important computational problem in learning and memory. Why do related memory representations sometimes become more similar to each other (integration) and sometimes more distinct (differentiation)? Classic supervised learning models predict that shared associations should cause memories to integrate, but these models have recently been challenged by empirical data showing that shared associations can sometimes cause differentiation. The authors have previously proposed that unsupervised learning may account for these unintuitive data. Here, they follow up on this idea by actually implementing an unsupervised neural network model that updates the connections between memories based on the amount of coactivity between them. The goal of the authors' paper is to assess whether such a model can account for recent empirical data at odds with supervised learning accounts. For each empirical finding they wish to explain, the authors built a neural network model with a very simple architecture (two inputs layers, one hidden layer, and one output layer) and with prewired stimulus representations and associations. On each trial, a stimulus is presented to the model, and inhibitory oscillations allow competing memories to pop up. Pre-specified u-shaped learning rules are used to update the weights in the model, such that low coactivity leaves model connections unchanged, moderate coactivity weakens connections, and high coactivity strengthens connections. In each of the three models, the authors manipulate stimulus similarity (following Chanales et al), shared vs distinct associations (following Favila et al), or learning strength (a stand in for blocked versus interleaved learning schedule; following Schlichting et al) and evaluate how the model representations evolve over trials.

      As a proof of principle, the authors succeed in demonstrating that unsupervised learning with a

      simple u-shaped rule can produce qualitative results in line with the empirical reports. For instance, they show that pairing two stimuli with a common associate (as in Favila et al) can lead to *differentiation* of the model representations. Demonstrating these effects isn't trivial and a formal modeling framework for doing so is a valuable contribution. Overall, the authors do a good job of both formally describing their model and giving readers a high level sense of how their critical model components work, though there are some places where the robustness of the model to different parameter choices is unclear. In some cases, the authors are very clear about this (e.g. the fast learning rate required to observe differentiation). However, in other instances, the paper would be strengthened by a clearer reporting of the critical parameter ranges.

      We thank the reviewer for raising this point. The interdependence of parameters in our model makes it infeasible to identify critical parameter ranges. We have added a paragraph to the “Approach to Parameterization and Data Fitting” section in the Methods to address this point (p. 33):

      “The overall goal of this modeling work is to account for key empirical regularities regarding differentiation and integration and to establish boundary conditions on these regularities. As such, the modeling work described below focuses more on qualitative fits to general properties of the data space than on quantitative fits to results from specific studies. Automatic parameter optimization is not feasible for this kind of model, given the large number of model parameters and the highly interactive, nonlinear nature of competitive dynamics in the model; consequently, model fitting was done by hand.

      These complex interactions between parameters also make it infeasible to list “critical parameter ranges” for generating particular model outcomes. Our experience in working with the model has been that activation dynamics are what matter most for learning, and that disparate parameter sets can give rise to the same activation dynamics and -- through this -- the same learning effects; likewise, similar parameter sets can give rise to different activation dynamics and different learning outcomes. Consequently, in this paper we have focused on characterizing the dynamics that give rise to different learning effects (and how they can be affected by local parameter perturbations, e.g., relating to learning rate and oscillation size), rather than the – impossible, we believe – task of enumerating the full set of parameter configurations that give rise to a particular result.”

      For instance, it's clear from the manipulation of oscillation strength in the model of Schlichting et al that this parameter can dramatically change the direction of the results. The authors do report the oscillation strength parameter values that they used in the other two models, but it is not clear how sensitive these models are to small changes in this value.

      In some cases, the effects of oscillation strength are relatively smooth. For example, in the Favila simulation, increasing the oscillation amplitude Osc effectively recapitulates the U-shaped curve (i.e., higher levels of Osc lead to more competitor activation, which initially leads to weakening / differentiation but then gives way to strengthening / integration), as is shown for the Favila Different Face condition in this plot:

      Author response image 3.

      In the Chanales 2/6 overlap condition, the effects of varying Osc are more nonlinear:

      Author response image 4.

      We think this is attributable to the increased “all-or-none” recurrent dynamics in this simulation (due to the recurrent projections within the output layer), which make it more difficult to evoke moderate (vs. high) levels of activation. This difficulty in reliably obtaining graded activation dynamics is likely a consequence of the small-scale (“toy”) nature of the model and the simple inhibitory mechanisms employed here, as opposed to being a generalizable property of the brain – presumably, the actual brain employs more nuanced and effective means of controlling activation. Furthermore, we don’t think that the high prevalence of integration in the model’s parameter space necessarily translates into a prediction that integration should be more prevalent overall – see the new “Reconciling the Prevalence of Differentiation in the Model and in the Data” section described in response to one of the reviewer’s other points below. Due to the paper already being quite long, we have opted not to include the above plots / discussion in the paper.

      Similarly, it's not clear whether the 2/6 hidden layer overlap (only explicitly manipulated in the model of Chanales et al) is required for the other two models to work.

      When we were parameterizing the model, we opted to keep the 2/6 level of overlap for all of the simulations and we adjusted other parameters to fit the data; in part, this was because overlap can only be adjusted in discrete jumps, whereas other influential parameters in the model can be adjusted in a more graded, real-valued way. Our use of 2/6 overlap (as opposed to, say, 1/6 or 3/6 overlap) for the Favila and Schlichting models was done out of convenience, and should not be interpreted as a strong statement that this particular level of overlap is necessary for obtaining differentiation; we could easily get the model to show differentiation given other overlap levels by adjusting other parameters.

      Finally, though the u-shaped learning rule is essential to this framework, the paper does little formal investigation of this learning rule. It seems obvious that allowing the u-shape to collapse too much toward a horizontal line would reduce the model's ability to account for empirical results, but there may be other more interesting features of the learning rule parameterization that are essential for the model to function properly.

      Given that the paper is already quite long, we have opted not to include further exploration of the parameters of the U-shaped learning rule in the paper. However, for the reviewer’s information, we report the effects of a few illustrative manipulations of these parameters below. As a general principle, the effects of these manipulations make sense in light of the theoretical framework described in the paper.

      For example, the parameter “DRevMag” controls the size of the negative “dip” in the U-shaped curve (more negative values = a larger dip). Given that this negative dip is essential for severing weights to competitors and causing differentiation, shifting DRevMag upwards towards zero should shift the balance of the model away from differentiation and towards integration. This is indeed what we observe, as shown in this parameter sweep from the Chanales simulation:

      Author response image 5.

      As another example: The “DRev” parameter controls where the U-shaped curve transitions from negative weight change to positive weight change. Lower values of DRev mean that the region of coactivity values leading to negative weight change will be smaller, and the region of coactivity values leading to positive weight change will be larger. As such, we would expect that lower values of DRev would bias the model toward integration. That is indeed the case, as shown in this parameter sweep from the Schlichting Blocked simulation:

      Author response image 6.

      There are a few other points that may limit the model's ability to clearly map onto or make predictions about empirical data. The model(s) seems very keen to integrate and do so more completely than the available empirical data suggest. For instance, there is a complete collapse of representations in half of the simulations in the Chanales et al model and the blocked simulation in the Schlichting et al model also seems to produce nearly complete integration Even if the Chanales et al paper had observed some modest behavioral attraction effects, this model would seem to over-predict integration. The author's somewhat implicitly acknowledge this when they discuss the difficulty of producing differentiation ("Practical Advice for Getting the Model to Show Differentiation") and not of producing integration, but don't address it head on.

      We thank the reviewer for this comment – R1 had a similar comment. We have added a new section to the Discussion to address this point (p. 30):

      “Reconciling the Prevalence of Differentiation in the Model and in the Data.

      A key lesson from our model is that, from a computational perspective, it is challenging to obtain differentiation effects: The region of parameter space that gives rise to differentiation is much smaller than the one that gives rise to integration (for further discussion of this issue, see the section in Methods on Practical Advice for Getting the Model to Show Differentiation). However, the fact that integration is more prevalent in our simulations across parameter configurations does not mean that integration will be more prevalent than differentiation in real-life circumstances. What really matters in predicting the prevalence of differentiation in real life is how the parameters of the brain map on to parameters of the model: If the parameters of the brain align with regions of model parameter space that give rise to differentiation (even if these regions are small), this would explain why differentiation has been so robustly observed in extant studies. Indeed, this is exactly the case that we sought to make above about the hippocampus – i.e., that its use of especially sparse coding and a high learning rate will give rise to the kinds of neural dynamics that cause differentiation (as opposed to integration). As another example, while it is true that half of the overlap conditions in our simulation of Chanales et al. (2021) give rise to integration, this does not imply that integration will occur half of the time in the Chanales et al. (2021) study; it may be that the levels of overlap that are actually observed in the brain in Chanales et al. (2021) are more in line with the levels of overlap that give rise to differentiation in our model.”

      Second, the authors choice of strongly prewiring associations in the Chanales and Favila models makes it difficult to think about how their model maps onto experimental contexts where competition is presumably occurring while associations are only weakly learned. In the Chanales et al paper, for example, the object-face associations are not well learned in initial rounds of the color memory test. While the authors do justify their modeling choice and their reasons have merit, the manipulation of AX association strength in the Schlichting et al model also makes it clear that the association strength has a substantial effect on the model output. Given the effect of this manipulation, more clarity around this assumption for the other two models is needed.

      We thank the reviewer for bringing this up. We have edited the section entitled “A Note on Prewiring Representations” in the Methods to further justify our choice to prewire associations in the Chanales and Favila models (p. 37):

      “In our model, our practice of ``prewiring'' memory representations for the A and B pairmates serves two functions. In some cases, it is meant to stand in for actual training (as in the blocked / interleaved manipulation; the connections supporting the AX association are prewired to be stronger in the blocked condition than in the interleaved condition). However, the other, more fundamental role of prewiring is to ensure that the A and B input patterns evoke sparse distributed representations in the hidden layer (i.e., where some units are strongly active but most other units are inactive). In the real brain, this happens automatically because the weight landscape has been extensively sculpted by both experience and evolution. For example, in the real hippocampus, when the second pairmate is presented for the first time, it will evoke a sparse distributed representation in the CA3 subfield (potentially overlapping with the first pairmate’s CA3 representation) even before any learning of the second pairmate has occurred, due to the strong, sparse mossy fiber projections that connect the dentate gyrus to CA3 (McNaughton & Morris, 1987). As discussed above, we hypothesize that this initial, partial overlap between the second pairmate’s representation and the first pairmate’s representation can lead to pop-up of the unique features of the first pairmate’s representation, triggering learning that leads to differentiation or integration. In our small-scale model, we are effectively starting with a ``blank brain''; in the absence of prewiring, the A and B inputs would activate overly diffuse representations that do not support these kinds of competitive dynamics. As such, prewiring in our model is necessary for proper functioning. The presence of prewired A and B representations should therefore not be interpreted as reflecting a particular training history (except in the blocked / interleaved case above); rather, these prewired representations constitute the minimum step we would take to ensure well-defined competitive dynamics in our small-scale model.

      The fact that connection strengths serve this dual function – sometimes reflecting effects of training (as in our simulation of Schlichting et al., 2015) and in other cases reflecting necessary prewiring – complicates the interpretation of these strength values in the model. Our view is that this is a necessary limitation of our simplified modeling approach – one that can eventually be surmounted through the use of more biologically-detailed architectures (see Limitations and Open Questions in the Discussion).”

      Overall, this is strong and clearly described work that is likely to have a positive impact on computational and empirical work in learning and memory. While the authors have written about some of the ideas discussed in this paper previously, a fully implemented and openly available model is a clear advance that will benefit the field. It is not easy to translate a high-level description of a learning rule into a model that actually runs and behaves as expected. The fact that the authors have made all their code available makes it likely that other researchers will extend the model in numerous interesting ways, many of which the authors have discussed and highlighted in their paper.

      Reviewer #3 (Public Review):

      This paper proposes a computational account for the phenomenon of pattern differentiation (i.e., items having distinct neural representations when they are similar). The computational model relies on a learning mechanism of the nonmonotonic plasticity hypothesis, fast learning rate and inhibitory oscillations. The relatively simple architecture of the model makes its dynamics accessible to the human mind. Furthermore, using similar model parameters, this model produces simulated data consistent with empirical data of pattern differentiation. The authors also provide insightful discussion on the factors contributing to differentiation as opposed to integration. The authors may consider the following to further strengthen this paper:

      The model compares different levels of overlap at the hidden layer and reveals that partial overlap seems necessary to lead to differentiation. While I understand this approach from the perspective of modeling, I have concerns about whether this is how the human brain achieves differentiation. Specifically, if we view the hidden layer activation as a conjunctive representation of a pair that is the outcome of encoding, differentiation should precede the formation of the hidden layer activation pattern of the second pairmate. Instead, the model assumes such pattern already exists before differentiation. Maybe the authors indeed argue that mechanistically differentiation follows initial encoding that does not consider similarity with other memory traces?

      Related to the point above, because the simulation setup is different from how differentiation actually occurs, I wonder how valid the prediction of asymmetric reconfiguration of hidden layer connectivity pattern is.

      We thank the reviewer for this comment. In the revised manuscript, we have edited the “Note on Prewiring Representations” in the Methods to clarify how our assumptions about prewiring relate to what we really think is happening in the brain (p. 37):

      “In our model, our practice of ``prewiring'' memory representations for the A and B pairmates serves two functions. In some cases, it is meant to stand in for actual training (as in the blocked / interleaved manipulation; the connections supporting the AX association are prewired to be stronger in the blocked condition than in the interleaved condition). However, the other, more fundamental role of prewiring is to ensure that the A and B input patterns evoke sparse distributed representations in the hidden layer (i.e., where some units are strongly active but most other units are inactive). In the real brain, this happens automatically because the weight landscape has been extensively sculpted by both experience and evolution. For example, in the real hippocampus, when the second pairmate is presented for the first time, it will evoke a sparse distributed representation in the CA3 subfield (potentially overlapping with the first pairmate’s CA3 representation) even before any learning of the second pairmate has occurred, due to the strong, sparse mossy fiber projections that connect the dentate gyrus to CA3 (McNaughton & Morris, 1987). As discussed above, we hypothesize that this initial, partial overlap between the second pairmate’s representation and the first pairmate’s representation can lead to pop-up of the unique features of the first pairmate’s representation, triggering learning that leads to differentiation or integration. In our small-scale model, we are effectively starting with a ``blank brain''; in the absence of prewiring, the A and B inputs would activate overly diffuse representations that do not support these kinds of competitive dynamics. As such, prewiring in our model is necessary for proper functioning. The presence of prewired A and B representations should therefore not be interpreted as reflecting a particular training history (except in the blocked / interleaved case above); rather, these prewired representations constitute the minimum step we would take to ensure well-defined competitive dynamics in our small-scale model.

      The fact that connection strengths serve this dual function – sometimes reflecting effects of training (as in our simulation of Schlichting et al., 2015) and in other cases reflecting necessary prewiring – complicates the interpretation of these strength values in the model. Our view is that this is a necessary limitation of our simplified modeling approach – one that can eventually be surmounted through the use of more biologically-detailed architectures (see Limitations and Open Questions in the Discussion).”

      Although as the authors mentioned, there haven't been formal empirical tests of the relationship between learning speed and differentiation/integration, I am also wondering to what degree the prediction of fast learning being necessary for differentiation is consistent with current data. According to Figure 6, the learning rates lead to differentiation in the 2/6 condition achieved differentiation after just one-shot most of the time. On the other hand, For example, Guo et al (2021) showed that humans may need a few blocks of training and test to start showing differentiation.

      We thank the reviewer for mentioning this. We have added a paragraph to the “Differentiation Requires a High Learning Rate and Is Sensitive to Activity Dynamics” section of the Discussion that addresses this point (pp. 28-29):

      “Although the results from Wanjia et al. (2021) provide strong support for the model's prediction that differentiation will be abrupt, they raise another question: What explains variance across items in when this abrupt change takes place? The answer to this question remains to be seen, but one possibility is encoding variability: If we assume that participants stochastically sample (i.e., attend to) the features of the scene pairmates, it is possible that participants might initially fail to sample the features that distinguish the scene pairmates, which can be quite subtle – and if the distinguishing features of the pairmates are not represented in high-level visual regions (i.e., the pairmates are represented in these regions as having the same features), this could delay the onset of differentiation until the point at which the distinguishing features happen (by chance) to be sampled.”

      Related to the point above, the high learning rate prediction also seems to be at odds with the finding that the cortex, which has slow learning (according to the theory of complementary learning systems), also shows differentiation in Wammes et al (2022).

      We now address this point in the section of the Discussion entitled “Differentiation Requires a High Learning Rate and Is Sensitive to Activity Dynamics” (p. 27):

      “Our finding that differentiation requires a high learning rate suggests that differentiation will be more evident in the hippocampus than in neocortex, insofar as hippocampus is thought to have a higher learning rate than neocortex (McClelland et al., 1995). In keeping with this prediction, numerous studies have found differentiation effects in hippocampus but not in neocortical regions involved in sensory processing (e.g., Chanales et al., 2017; Favila et al., 2016; Zeithamova et al., 2018). At the same time, some studies have found differentiation effects in neocortex (e.g., Schlichting et al., 2015; Wammes et al., 2022). One possible explanation of these neocortical differentiation effects is that they are being ``propped up’’ by top-down feedback from differentiated representations in the hippocampus.”

      More details about the learning dynamics would be helpful. For example, equation(s) showing how activation, learning rate and the NMPH function work together to change the weight of connections may be added. Without the information, it is unclear how each connection changes its value after each time point.

      We thank the reviewer for this comment. We have made two major changes to address this concern. First, we have edited the “Learning” section within “Basic Network Properties” in the main text (pp. 6-7):

      “Connection strengths in the model between pairs of connected units x and y were adjusted at the end of each trial (i.e., after each stimulus presentation) as a U-shaped function of the coactivity of x and y, defined as the product of their activations on that trial. The parameters of the U-shaped learning function relating coactivity to change in connection strength (i.e., weakening / strengthening) were specified differently for each projection where learning occurs (bidirectionally between the input and hidden layers, the hidden layer to itself, and the hidden to output layer). Once the U-shaped learning function for each projection in each version of the model was specified, we did not change it for any of the various conditions. Details of how we computed coactivity and how we specified the U-shaped function can be found in the Methods section.”

      Second, we have added the requested equations to the “Learning” part of the Methods (pp. 37-38):

      The right side of the function, strong activation leads to strengthening of the connectivity, which I assume will lead to stronger activation on the next time point. The model has an upper limit of connection strength to prevent connection from strengthening too much. The same idea can be applied to the left side of the function: instead of having two turning points, it can be a linear function such that low activation keeps weakening connection until the lower limit is reached. This way the NMPH function can take a simpler form (e.g., two line-segments if you think the weakening and strengthening take different rates) and may still simulate the data.

      We thank the reviewer for mentioning this. We have added a new paragraph in the “Learning” section of the Methods to justify the particular shape of the learning curve (pp. 38-39):

      “Evidence for the U-shaped plasticity function used here (where low activation leads to no change, moderate activation leads to weakening, and higher levels of activation lead to strengthening) was previously reviewed in Ritvo et al. (2019). In brief, there are three lines of work that support the U shape: First, multiple neurophysiological studies have found that moderate postsynaptic depolarization leads to synaptic weakening and higher levels of depolarization lead to synaptic strengthening (e.g., Artola et al., 1990; Hansel et al., 1996). Second, human neuroscience studies have used pattern classifiers, applied to fMRI and EEG data, to measure memory activation, and have related this measure to subsequent memory accessibility; several studies using this approach have found that low levels of activation lead to no change in memory strength, moderate levels of activation lead to impaired subsequent memory, and higher levels of activation lead to increased subsequent memory (e.g., Newman and Norman, 2010; Detre et al., 2013; Kim et al., 2014; for related findings, see Lewis-Peacock and Norman, 2014; Wang et al., 2019). Third, a recent human fMRI study by Wammes et al. (2022) manipulated memory activation by varying the visual similarity of pairmates and observed a U-shaped function relating visual similarity to representational change in the hippocampus, whereby low levels of pairmate similarity were associated with no change, moderate levels of similarity were associated with differentiation, and the differentiation effect went away at higher levels of similarity.

      We have also included a pointer to this new paragraph in the “Nonmonotonic Plasticity Hypothesis” section of Introduction (p. 2):

      (for further discussion of the empirical justification for the NMPH, see the Learning subsection in the Methods)”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A few additional minor things about data presentation and the like:

      (1) Figure 1 legend - a more general description of how to interpret the figure might be helpful for more naive readers (e.g., explaining how one can visualize in the schematic that there is overlap in the hidden layer between A and B). Also, from the Figure 1 depiction, it's not clear what is different about the setup from the initial left hand side panels in A, B, C, to make it such that activity spreads strongly to A in panel A, weakly in panel B, and not at all in panel C since the weights are the same. Is there a way to incorporate this into the graphic, or describe it in words?

      To address this point, we have added the following text to the Figure 1 caption (p. 3):

      “Note that the figure illustrates the consequences of differences in competitor activation for learning, without explaining why these differences would arise. For discussion of circumstances that could lead to varying levels of competitor activation, see the simulations described in the text.”

      (2) I believe not all of the papers cited on lines 193-195 actually have similarity manipulations in them. I'd recommend double checking this list and removing those less relevant to the statement.

      Thank you for pointing this out; we have removed the Ballard reference and we have clarified what we mean by similarity reversal (p. 7):

      “The study was inspired by recent neuroimaging studies showing ``similarity reversals'', wherein stimuli that have more features in common (or share a common associate) show less hippocampal pattern similarity (Favila et al., 2016; Schlichting et al., 2015; Molitor et al., 2021; Chanales et al., 2017; Dimsdale-Zucker et al., 2018; Wanjia et al., 2021; Zeithamova et al., 2018; Jiang et al., 2020; Wammes et al., 2022).”

      (3) I wanted a bit more detail about how the parameters were set in the main paper, not just in the methods. Even something as brief as noting that model fitting was done by hand by tweaking parameters to re-create the empirical patterns (if I'm understanding correctly) would have been helpful for me.

      To address this point, we have added the following text under “Basic Network Properties” (p. 4):

      “Our goal was to qualitatively fit key patterns of results from each of the aforementioned studies. We fit the parameters of the model by hand as they are highly interdependent (see the Methods section for more details).”

      (4) In Figure 4E, it would be helpful to describe the x and y axes of the MDS plots in the legend.

      To address this point, we have added the following new text to the Figure 4 caption that clarifies how the MDS plots were generated (p. 11):

      “MDS plots were rotated, shifted, and scaled such that pairmate 1before is located at (0,0), pairmate 2before is located directly to the right of pairmate 1before, and the distance between pairmate 1before and pairmate 2before is proportional to the baseline distance between the pairmates.”

      (5) Figure 6 - at first I thought the thicker line was some sort of baseline, but I think it is just many traces on top of one another. If other readers may be similarly confused, perhaps this could be stated.

      Thanks for this comment. We have updated Figure 6 (p. 16).

      We have also updated the caption.

      I am having a lot of difficulty understanding the terms "competitor-to-competitor,"

      "competitor-to-target/shared," and "target/shared-to-target/shared," and therefore I don't fully get Figure 5. I think it might be helpful to expand the description of these terms where they are first introduced in the paper (p. 13?). I think I am missing something crucial here, and I am not quite sure what that is-which I know is not very helpful! But, to narrate my confusion a bit, I thought that these terms would somehow relate to connections between different connections of the network. For example is competitor-to-competitor within the hidden layer? Or is this somehow combining across relevant connections that might span different pairs of layers in the model? And, I really have no idea why it is "target/shared."

      Thank you for these comments. We have updated Figure 5 and we have also made several changes to the main text and the figure caption to address these points.

      Changes to the main text (p. 13):

      “Whether symmetric or asymmetric integration occurs depends on the relative strengths of connections between pairs of unique competitor units (competitor-competitor connections) compared to connections between unique competitor units and shared units (competitor-shared connections) after the first trial (Figure 5; note that the figure focuses on connections between hidden units, but the principle also applies to connections that span across layers). Generally, coactivity between unique competitor units (competitor-competitor coactivity) is less than coactivity between unique competitor units and shared units (competitor-shared coactivity), which is less than coactivity between unique target units and shared units (target-shared coactivity).”

      (7) Relatedly in Figure 13, I understand how some competitor-to-target/shared connections could be spared in the bottom instance given panel B. However, I'm struggling to understand how that relates to the values in the corresponding chart in panel A. What about panel A, bottom (vs. the top) means lower coactivities between some competitor-to-target/shared? Is it because if the noise level is higher, the "true" activation of competitor-to-target/shared connections is weaker? I think again, I'm missing something critical here! and wonder if other readers may be in the same situation. (I know the authors described this also on p. 36, but I'm still confused!)

      We have updated Figure 13 to clarify these points.

      (8)  In Figure 9, I believe there is no caption for panel D. Also, it looks as though the item unit active for A and B is the same. I wonder if this is an error?

      Thank you for catching these errors! They have both been fixed.

      Reviewer #2 (Recommendations For The Authors):

      -Perhaps I missed it, but I think defining coactivity (how it is computed) in the main text would be useful for readers, as this is critical for understanding the model. I did find it in the methods.

      We thank the reviewer for this suggestion. We have updated the “Learning” section within “Basic Network Properties” in the main text to address this point (pp. 6-7):

      “Connection strengths in the model between pairs of connected units x and y were adjusted at the end of each trial (i.e., after each stimulus presentation) as a U-shaped function of the coactivity of x and y, defined as the product of their activations on that trial. The parameters of the U-shaped learning function relating coactivity to change in connection strength (i.e., weakening / strengthening) were specified differently for each projection where learning occurs (bidirectionally between the input and hidden layers, the hidden layer to itself, and the hidden to output layer). Once the U-shaped learning function for each projection in each version of the model was specified, we did not change it for any of the various conditions. Details of how we computed coactivity and how we specified the U-shaped function can be found in the Methods section.”

      -The modeling results in the different face condition are at odds with the data for the Favila et al model (they observe some differentiation in the paper and the model predicts no change). This could be due to a number of unmodeled factors, but it is perhaps worth noting.

      Thank you for pointing this out. It is possible to better capture the pattern of results observed by Favila et al. in their paper (with some differentiation in the different-face condition and even more differentiation in the same-face condition) by slightly adjusting the model parameters (specifically, by setting the oscillation amplitude Osc for the hidden layer to .1 instead of .067).

      Rather than replacing the old (Osc \= .067) results in the paper, which would entail re-making the associated videos, etc., we have added a supplementary figure (Figure 8 - Supplement 1; see p.45):

      We also added new text to the Favila Results, under “Differentiation and Integration” (p. 20):

      “Note also that the exact levels of differentiation that are observed in the different-face and same-face conditions are parameter dependent; for an alternative set of results showing some differentiation in the different-face condition (but still less than is observed in the same-face condition), see Figure 8 - Supplement 1.”

      -Related to my comment in the public review about pre-wiring associations, in the caption for Figure 9 (Schlichting model), the authors report "In both conditions, the pre-wired connection linking the "item B" hidden units to the "item X" output unit is set to .7. In the interleaved condition, the connection linking the "item A" hidden units to the "item X" output unit is set to .8, to reflect some amount of initial AX learning. In the blocked condition, the connection linking the "item A" hidden units to the "item X" output unit is set a higher value (.999), to reflect extra AX learning." What are the equivalent values for the other models, especially the Favila model since the structure is the same as Schlichting? I understood all the "strong" connections to be .99 unless otherwise stated. If that's the case, I don't understand why the blocked Schlichting model and the Favila model produce opposite effects. More clarity would be useful here.

      We have added a new paragraph to the results section for the Schlicting model (under “Differentiation and Integration”) to clarify why the blocked Schlichting model and the Favila model show different results (p. 24):

      “Note that the key feature driving integration in the blocked condition of this simulation is not the high strength of the connection from X to A on its own – rather, it is the asymmetry in the pretrained connection strengths from X to A (.999) and from X to B (.7). This asymmetry, which is meant to reflect the extensive training on A-X that occurred before the initial presentation of B-X, results in the A-X hidden representation decisively winning the competition during B-X presentation, which then leads to the B input also being linked to this representation (i.e., integration). It is instructive to compare this to the same-face condition from our simulation of Favila et al. (2016): In that simulation, the two pairmates are also linked strongly (.99 initial connection strength) to a shared associate, but in that case the connections are equally strong, so there is more balanced competition -- in this case, the competitor representation only comes to mind moderately (instead of displacing the target representation), so the result is differentiation instead of integration.”

      -The meaning of the different colored dots in Figure 5 is bit hard to keep track of, even given the legend labels. The figure might benefit from a model sketch highlighting each of the different coactivity types. The left side of Fig 13 was useful but again somehow mapping on the colors would help further. Another note on these figures: what does having two dots of each color mean? Is it just an illustration of the variance? There would be more dots if there was one dot per coactivity value.

      We have updated Figure 5 and Figure 13 to clarify these points (including a clarification that the dots only represent a subset of the possible pairings between units).

      -While I appreciate the goal of the paper is to account for these three studies, readers who aren't familiar with or specifically interested in these studies may appreciate a small amount of intuition on why formalizing unsupervised learning models may be broadly important for computational investigations of learning/memory/cognition.

      We have added the following text under “Basic Network Properties” in the Introduction to address this point (p. 4):

      “Achieving a better understanding of unsupervised learning is an important goal for computational neuroscience, given that learning agents have vastly more opportunities to learn in an unsupervised fashion than from direct supervision (for additional discussion of this point, see, e.g., Zhuang et al., 2021).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this useful study, the authors report the efficacy, hematological effects, and inflammatory response of the BPaL regimen (containing bedaquiline, pretomanid, and linezolid) compared to a variation in which Linezolid is replaced with the preclinical development candidate spectinamide 1599, administered by inhalation in tuberculosis-infected mice. The authors provide convincing evidence that supports the replacement of Linezolid in the current standard of care for drug-resistant tuberculosis. However, a limitation of the work is the lack of control experiments with bedaquiline and pretomanid only, to further dissect the relevant contributions of linezolid and spectinamide in efficacy and adverse effects.

      We acknowledge a limitation in our study due to lack of groups with monotherapy of bedaquiline and pretomanid however, similar studies to understand contribution of bedaquiline and pretomanid to the BPaL have been published already (references #4 and #60 in revised manuscript).  Our goal was to compare the BPaS versus the BPaL with the understanding that TB treatment requires multidrug therapy.   We omitted monotherapy groups to reduce complexity of the studies because the multidrug groups require very large number of animals with very intensive and complex dosing schedules. Even if B or Pa by themselves have better efficacy than the BPa or BPaL combination, patients will not be treated with only B or Pa because of very high risk of developing drug resistance to B or/and PA. If drug resistance is developed for B or Pa, the field will lose very effective drugs against TB. 

      Although the manuscript is well written overall, a re-formulation of some of the stated hypotheses and conclusions, as well as the addition of text to contextualize translatability, would improve value.

      Manuscript has been edited to address these critiques.  Answers to individual critiques are below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript is an extension of previous studies by this group looking at the new drug spectinamide 1599. The authors directly compare therapy with BPaL (bedaquiline, pretomanid, linezolid) to a therapy that substitutes spectinamide for linezolid (BPaS). The Spectinamide is given by aerosol exposure and the BPaS therapy is shown to be as effective as BPaL without adverse effects. The work is rigorously performed and analyses of the immune responses are consistent with curative therapy.

      Strengths:

      (1) This group uses 2 different mouse models to show the effectiveness of the BPaS treatment.

      (2) Impressively the group demonstrates immunological correlates associated with Mtb cure with the BPaS therapy.

      (3) Linezolid is known to inhibit ribosomes and mitochondria whereas spectinaminde does not. The authors clearly demonstrate the lack of adverse effects of BPaS compared to BPaL.

      Weaknesses:

      (1) Although this is not a weakness of this paper, a sentence describing how the spectinamide would be administered by aerosolization in humans would be welcomed.

      We already reported on the aerodynamic properties of dry powder spectinamide 1599 within #3 HPMC capsules and its delivery from a RS01 Plastiape inhaler device (reference #59 in revised manuscript).  To address this critique, we added a last paragraph in discussion “It is proposed that human use of spectinamides 1599 will be administered using a dry powder formulation delivered by the RS01 Plastiape dry powder inhaler" (reference #59 in revised manuscript).  

      Reviewer #2 (Public Review):

      Summary:

      Replacing linezolid (L) with the preclinical development candidate spectinamide 1599, administered by inhalation, in the BPaL standard of care regimen achieves similar efficacy, and reduces hematological changes and proinflammatory responses.

      Strengths:

      The authors not only measure efficacy but also quantify histological changes, hematological responses, and immune responses, to provide a comprehensive picture of treatment response and the benefits of the L to S substitution.

      The authors generate all data in two mouse models of TB infection, each reproducing different aspects of human histopathology.

      Extensive supplementary figures ensure transparency. 

      Weaknesses:

      The articulation of objectives and hypotheses could be improved.

      We edited to "The AEs were associated with the long-term administration of the protein synthesis inhibitor linezolid. Spectinamide 1599 (S) is also a protein synthesis inhibitor of Mycobacterium tuberculosis with an excellent safety profile, but which lacks oral bioavailability. Here, we propose to replace L in the BPaL regimen with spectinamide administered via inhalation and we demonstrate that inhaled spectinamide 1599, combined with BPa ––BPaS regimen––has similar efficacy to that of BPaL regimen while simultaneously avoiding the L-associated AEs.

      Reviewer #3 (Public Review):

      Summary:

      In this paper, the authors sought to evaluate whether the novel TB drug candidate, spectinamide 1599 (S), given via inhalation to mouse TB models, and combined with the drugs B (bedaquiline) and Pa (pretomanid), would demonstrate similar efficacy to that of BPaL regimen (where L is linezolid). Because L is associated with adverse events when given to patients long-term, and one of those is associated with myelosuppression (bone marrow toxicity) the authors also sought to assess blood parameters, effects on bone marrow, immune parameters/cell effects following treatment of mice with BPaS and BPaL. They conclude that BPaL and BPaS have equivalent efficacy in both TB models used and that BPaL resulted in weight loss and anemia (whereas BPaL did not) under the conditions tested, as well as effects on bone marrow.

      Strengths:

      The authors used two mouse models of TB that are representative of different aspects of TB in patients (which they describe well), intending to present a fuller picture of the activity of the tested drug combinations. They conducted a large body of work in these infected mice to evaluate efficacy and also to survey a wide range of parameters that could inform the effect of the treatments on bone marrow and on the immune system. The inclusion of BPa controls (in most studies) and also untreated groups led to a large amount of useful data that has been collected for the mouse models per se (untreated) as well as for BPa - in addition to the BPaS and BPaL combinations which are of particular interest to the authors. Many of these findings related to BPa, BPaL, untreated groups, etc corroborate earlier findings and the authors point this out effectively and clearly in their manuscript. To go further, in general, it is a well-written and cited article with an informative introduction.

      Weaknesses:

      The authors performed a large amount of work with the drugs given at the doses and dosing intervals started, but at present, there is no exposure data available in the paper. It would be of great value to understand the exposures achieved in plasma at least (and in the lung if more relevant for S) in order to better understand how these relate to clinical exposures that are observed at marketed doses for B, Pa, and L as well as to understand the exposure achieved at the doses being evaluated for S. If available as historical data this could be included/cited. Considering the great attempts made to evaluate parameters that are relevant to clinical adverse events, it would add value to understand what exposures of drug effects such as anemia, weight loss, and bone marrow effects, are being observed. It would also be of value to add an assessment of whether the weight loss, anemia, or bone marrow effects observed for BPaL are considered adverse, and the extent to which we can translate these effects from mouse to patient (i.e. what are the limitations of these assessments made in a mouse study?). For example, is the small weight loss seen as significant, or is it reversible? Is the magnitude of the changes in blood parameters similar to the parameters seen in patients given L? In addition, it is always challenging to interpret findings for combinations of drugs, so the addition of language to explain this would add value: for example, how confident can we be that the weight loss seen for only the BPaL group is due to L as opposed to a PK interaction leading to an elevated exposure and weight loss due to B or Pa?

      We totally agree with this critique but the studies suggested by the reviewer are very expensive and

      logistically/resource intensive. Data reported in this manuscript was used as preliminary data in a RO1 application to NIH-NIAID that included studies proposed above by this reviewer. The authors are glad to report that the application got a fundable score and is currently under consideration for funding by NIH-NIAID.   The summary of proposed future studies is included in the last paragraph of the discussion in this revised manuscript. 

      Turning to the evaluations of activity in mouse TB models, unfortunately, the evaluations of activity in the BALB/c mouse model as well as the spleens of the Kramnik model resulted in CFU below/at the limit of detection and so, to this reviewer's understanding of the data, comparisons between BPaL and BPaS cannot be made and so the conclusion of equivalent efficacy in BALB/c is not supported with the data shown. There is no BPa control in the BALB/c study, therefore it is not possible to discern whether L or S contributed to the activity of BPaL or BPaS; it is possible that BPa would have shown the same efficacy as the 3 drug combinations. It would be valuable to conduct a study including a BPa control and with a shorter treatment time to allow comparison of BPa, BPaS, and BPaL. 

      We agree with the reviewer these studies need to be done.  Some of them were recently published by our colleague Dr. Lyons (reference #60 in revised manuscript). The studies proposed by the reviewer will be performed under a new award under consideration for funding by the NIH-NIAID, the summary of future studies is included in the last paragraph of the discussion in this revised manuscript. 

      In the Kramnik lungs, as the authors rightly note, the studies do not support any contribution of S or L to BPa - i.e. the activity observed for BPa, BPaL, and BPaS did not significantly differ. Although the conclusions note equivalency of BPaL and BPaS, which is correct, it would be helpful to also include BPa in this statement;

      We edited and now included in lines #191 as requested 

      It would be useful to conduct a study dosing for a longer period of time or assessing a relapse endpoint, where it is possible that a contribution of L and/or S may be seen - thus making a stronger argument for S contributing an equivalent efficacy to L. The same is true for the assessment of lesions - unfortunately, there was no BPa control meaning that even where equivalency is seen for BPaL and BPaS, the reader is unable to deduce whether L or S made a contribution to this activity.

      Added in the future plans in the last paragraph of discussion

      “Future studies are already under consideration for funding by NIH-NIAID to understand the pharmacokinetics of mono, binary and ternary combinations of BPaS. These studies also aim to identify the optimal dose level and dosing frequency of each regimen along with their efficacy and relapse free-sterilization potential. Studies are also planned using a model-based pharmacokinetic-pharmacodynamic (PKPD) framework, guided by an existing human BPa PKPD model (reference #61 in revised manuscript), to find allometric human dose levels, dosing frequencies and treatment durations that will inform the experimental design of future clinical studies. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Although this is not a weakness of this paper, a sentence describing how the spectinamide would be administered by aerosolization in humans would be welcomed.

      Last paragraph of discussion was added “It is proposed that human use of spectinamides 1599 will be administered using a dry powder formulation delivered by the RS01 Plastiape dry powder inhaler". We already reported on the aerodynamic properties of dry powder spectinamide 1599 within #3 HPMC capsules and delivered from a RS01 Plastiape inhaler device (reference #59 in revised manuscript)

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      The Abstract lacks focus and could more clearly convey the key messages.

      Edited as requested 

      The two mouse models and why they were chosen need to be described earlier. Currently, it's covered in the first section of the Discussion, but the reader needs to understand the utility of each model in answering the questions at hand before the first results are described, either in the introduction or in the opening section of the results.

      Thank you for suggestion, we agree.  We moved the first paragraph in discussion to last paragraph in Introduction. 

      Line 130: Please justify the doses and dosing frequency for S. A reference to a published manuscript could suffice if compelling.

      The dosing and regimens were previously reported by our groups in ref 21 and 22 in revised manuscript.- 

      (21) Robertson GT, Scherman MS, Bruhn DF, Liu J, Hastings C, McNeil MR, et al. Spectinamides are effective partner agents for the treatment of tuberculosis in multiple mouse infection models. J Antimicrob Chemother.

      2017;72(3):770–7. 

      (22) Gonzalez-Juarrero M, Lukka PB, Wagh S, Walz A, Arab J, Pearce C, et al. Preclinical Evaluation of Inhalational Spectinamide-1599 Therapy against Tuberculosis. ACS Infect Dis. 2021;7(10):2850–63. 

      Figures 1 E to H: several "ns" are missing, please add them.

      Edited as requested 

      Line 184 to 190: suggest moving the body weight plots to a Supplemental Figure, and at least double the size of the histology images to convey the message of lines 192-203.

      Please include higher magnification insets to illustrate the histopathological findings. In that same section, please add a sentence or two describing the lesion scoring concept/method. It is a nice added feature, not widespread in the field, and deserves a brief description.

      Edited as requested.  We added detailed description for scoring method in M&M under histopathology and lesion scoring

      Line 206: please add an introductory sentence explaining why one would expect S to cause (or not) hematological disruption, and why MCHC and RDW were chosen initially (they are markers of xyz). The first part of Figure 3 legend belongs to the Methods.

      To address this critique we added in #225-226 “The effect of L in the blood profile of humans and mouse has been reported (references #38-42 in revised manuscript) but the same has not been reported for S” . In line #229-230 we added “Of 20-blood parameters evaluated, two blood parameters were affected during treatment”. 

      The first part of Figure 3 legend belongs to the Methods.

      We edited Figure 3 to “During therapy of mice in Figure 1, the blood was collected at 1, 2- and 4-weeks posttreatment. The complete blood count was collected in VETSCAN® HM5 hematology analyzer (Zoetis)”.

      Line 218: please explain why the 4 blood parameters that are shown were selected, out of the 20 parameters surveyed.

      We added an explanation in line 239-240 “out 20-blood parameters evaluated, a total of four blood parameters were affected at 2 and 4-weeks-of treatment”.

      Line 243 and again Line 262 (similar to comment Line 206): please add an introductory paragraph explaining the motivation to conduct this analysis and the objective. Can the authors put the experiment in the context of their hypothesis?

      To address this critique, we added in line #235-237 “The Nix-TB trial associated the long-term administration of L within the BPaL regimen as the causative agent resulting in anemia in patients treated with the BPaL regimen (5).”

      Figure 4C (and the plasma and lung equivalent in the SI). This figure needs adequate labeling of axes: X axis = LOG CFU? Please add tick marks for all plots since log CFU is only shown for the bottom line. Y axes have no units: pg/mL as in B?

      Figure legend were edited to add (Y axis:pg/ml) and (X axis; log10CFU).  

      Line 255-256: please remove "pronounced" and "profound". There is a range of CFU reduction and cytokine reduction, from minor to major. The correlation trend is clear and those words are not needed.

      Edited as requested 

      Line 277-289, Figure 6: given the heterogeneity of a C3HeB/FeJ mouse lung (TB infected), and the very heterogeneous cell population distribution in these lungs (Fig. 6A), the validity of whole lung analysis on 2 or 3 mice (the legend should state what 1, 2 and 3 means, individual mice?) is put into question. "F4/80+ cells were observed significantly higher in BPaS compared to UnRx control": Figure S14 suggests a statistically significant difference, but nothing is said about the other cell type, which appears just as much reduced in BPaS compared to UnRx as F4/80+. Overall, sampling the whole lung for these analyses should be mentioned as a limitation in the Discussion.

      We agree with the reviewer that "visually" it appears as other populations in addition to F4/80 have statistical significance.  We run again the two way Anova with Tukey test and only the BPaS and UnRx for F4/80 is significant. 

      We edited figure S16 (previously S14) to add ns for every comparation.  

      In Figure 6A was edited ;  N=2 are 2 mice for Unrx and n=3 mice for BPaL/BPaS each.

      Line 355-360: "The BPa and BPaL regimens altered M:E in the C3HeB/FeJ TB model by suppressing myeloid and inducing erythroid lineages" This suggests that altered M:E is not associated with L, putting into question the comparison between BPaS, BPaL, and UnRx. Can the authors comment on how M:E is altered in BPa and not in BPaS?

      Our interpretation to this result was that addition of S in our regimen BPsS was capable of restoring the M:E ratio altered by the BPa and BPaL. This interpretation was included in main text in line #263-264 and is also now added to abstract

      Line 379: discuss the limitations of working with whole lungs.

      Sorry we cannot understand this request. In our studies we always work with whole lungs if the expected course of histopathology/infection among lung lobes is very variable (as is the case of C3HeB/Fej TB model)

      Concluding paragraph: "Here we present initial results that are in line with these goals." If such a bold claim is made, there needs to be a discussion on the translatability of the route of administration and the dose of S. Otherwise, please rephrase.

      We added the following last paragraph to discussion:

      To conclude, the TB drug development field is working towards developing shorter and safer therapies with a common goal of developing new multidrug regimens of low pill burden that are accessible to patients, of short duration (ideally 2-3 months) and consist of 3-4 drugs of novel mode-of-action with proven efficacy, safety, and limited toxicity. Here we present initial results for new multidrug regimens containing inhaled spectinamide 1599 that are in line with these goals. It is proposed that human use of spectinamides 1599 will be administered using a dry powder formulation delivered by the RS01 Plastiape dry powder inhaler.  We already reported on the aerodynamic properties of dry powder spectinamide 1599 within #3 HPMC capsules and delivered from a RS01 Plastiape inhaler device (reference #59 in revised manuscript). Future studies are already under consideration for funding by NIHNIAID to understand the pharmacokinetics of mono, binary and ternary combinations of BPaS. These studies also aim to identify the optimal dose level and dosing frequency of each regimen along with their efficacy and relapse free-sterilization potential. Studies are also planned using a model-based pharmacokinetic-pharmacodynamic (PKPD) framework, guided by an existing human BPa PKPD model (references #60 and 61 in revised manuscript) , to find allometric human dose levels, dosing frequencies and treatment durations that will inform the experimental design of future clinical studies.

      Minor edits

      Adverse events, not adverse effects (side effects)

      Edited as requested

      BALB/c (not Balb/c, please change throughout).

      Edited as requested

      Line 92: replace 'efficacy' with potency or activity.

      Edited as requested

      "Live" body weight: how is that different from "body weight"? Suggest deleting "live" throughout, or replace with "longitudinally recorded" if that's what is meant, although this is generally implied.

      Edited as requested

      The last line of Figure 2 legend is disconnected. 

      Line 331: delete "human".

      Edited as requested

      Reviewer #3 (Recommendations For The Authors):

      We thank the reviewer for these suggestions.  The data presented in this manuscript with 4 weeks of treatment along with monitoring of effects of therapy in blood, bone marrow and immunity have been submitted for a RO1 application to NIH-NIAID, which have received a fundable score and is under funding consideration. All the points suggested by the reviewer(s) here are included in the research proposed in the RO1 application including manufacturing and physico-chemically characterize larger scale of dry powders of spectinmides and evaluation of their aerodynamic performance for human or animal use; Pharmacokinetics and efficacy studies to determine the optimal dose level and dosing frequency for new multidrug regimens containing spectinamides. These studies include mono, binary and ternary combinations of each multidrug regimen along with their efficacy and relapse free- sterilization potential. These studies will also develop PK/PD simulation-based allometric scaling to aid in human dose projections inhalation. We hope the reviewer will understand all together these studies will last 4-5 years.  

      Although I truly appreciate the great efforts of the authors, I suggest that in order to better evaluate the contribution of S versus L to BPa in these models, repeat studies be run that:

      (a) include BPa groups to allow the contribution of S and L to be assessed. Included in research proposed RO1 application mentioned above

      (b) use shorter treatment times in BALB/c to allow comparisons at end of Tx CFU above the LOD. We have added new data for 2 weeks treatment with BPaL and BPaS in Balb/c mice infected with MTb that was removed from previous submission of this manuscript

      (c) use longer treatment times and ideally a relapse endpoint in Kramnik to allow

      assessment of L and S as contributors to BPa (i.e. give a chance to see better efficacy of BPaL or BPaS versus BPa) and also measure plasma exposures of all drugs (or lung levels if this is the translatable parameter for S) to allow detection of any large DDI and also understand the translation to the clinic. Related to the safety parameters, it would be really great to understand whether or not the observations for BPaL would be labeled adverse in a toxicology study/in a clinical study, and it would be useful to include information on the magnitude of observations seen here versus in the clinic (eg for the hematological parameters).

      The research proposed in the RO1 application mentioned above included extensive PK, extended periods of treatment beyond 1 month of treatment (2-5 months as needed to reach negative culturable bacterial from organs) and of course relapse studies. 

      Minor point: I suggest rewording "high safety profile" when describing spectinomides in the intro - or perhaps qualify the length of dosing where the drug is well tolerated

      "high safety profile" was replaced by “an acceptable safety profile”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment  

      This important study builds on a previous publication (with partially overlapping authors), demonstrating that T. brucei has a continuous endomembrane system, which probably facilitates high rates of endocytosis. Using a range of cutting-edge approaches, the authors present compelling evidence that an actomyosin system, with the myosin TbMyo1 as the molecular motor, is localized close to the endosomal system in the bloodstream form (BSF) of Trypanosoma brucei. It shows convincingly that actin is important for the organization and integrity of the endosomal system, and that the trypanosome Myo1is an active motor that interacts with actin and transiently associates with endosomes, but a role of Myo1 in endomembrane function in vivo was not directly demonstrated. This work should be of interest to cell biologists and microbiologists working on the cytoskeleton, and unicellular eukaryotes.

      We were delighted at the editors’ positive assessment and the reviewers’ rigorous, courteous, and constructive responses to the paper. We agree that a direct functional role for TbMyo1 in endomembrane activity was not demonstrated in the original submission, but have incorporated some new data (see new supplemental Figure S5) using the TbMyo1 RNAi cell line which are consistent with our earlier observations and interpretations.  

      Public Reviews:   

      Reviewer #1 (Public Review):  

      Using a combination of cutting-edge high-resolution approaches (expansion microscopy, SIM, and CLEM) and biochemical approaches (in vitro translocation of actin filaments, cargo uptake assays, and drug treatment), the authors revisit previous results about TbMyo1 and TbACT in the bloodstream form (BSF) of Trypanosoma brucei. They show that a great part of the myosin motor is cytoplasmic but the fraction associated with organelles is in proximity to the endosomal system. In addition, they show that TbMyo1 can move actin filaments in vitro and visualize for the first time this actomyosin system using specific antibodies, a "classical" antibody for TbMyo1, and a chromobody for actin. Finally, using latrunculin A, which sequesters G-actin and prevents F-actin assembly, the authors show the delocalization and eventually the loss of the filamentous actin signal as well as the concomitant loss of the endosomal system integrity. However, they do not assess the localization of TbMyo1 in the same conditions.  

      Overall the work is well conducted and convincing. The conclusions are not over-interpreted and are supported by the experimental results. 

      We are very grateful to Reviewer1 for their balanced assessment. The reviewer is correct that we did not assess the localisation of TbMyo1 following latrunculin A treatment, but it is worth noting that Spitznagel et al. carried out this exact experiment in the earlier 2010 paper – we have mentioned this in the revised manuscript.  

      Reviewer #2 (Public Review):  

      Summary:  

      The study by Link et al. advances our understanding of the actomyosin system in T. brucei, focusing on the role 

      of TbMyo1, a class I myosin, within the parasite's endosomal system. Using a combination of biochemical fractionation, in vitro motility assays, and advanced imaging techniques such as correlative light and electron microscopy (CLEM), this paper demonstrates that TbMyo1 is dynamically distributed across early and late endosomes, the cytosol, is associated with the cytoskeleton, and a fraction has an unexpected association with glycosomes. Notably, the study shows that TbMyo1 can translocate actin filaments at velocities suggesting an active role in intracellular trafficking, potentially higher than those observed for similar myosins in other cell types. This work not only elucidates the spatial dynamics of TbMyo1 within T. brucei but also suggests its broader involvement in maintaining the complex architecture of the endosomal network, underscoring the critical role of the actomyosin system in a parasite that relies on high rates of endocytosis for immune evasion. 

      Strengths:  

      A key strength of the study is its exceptional rigor and successful integration of a wide array of sophisticated techniques, such as in vitro motility assays, and advanced imaging methods, including correlative light and electron microscopy (CLEM) and immuno-electron microscopy. This combination of approaches underscores the study's comprehensive approach to examining the ultrastructural organization of the trypanosome endomembrane system. The application of functional data using inhibitors, such as latrunculin A for actin depolymerization, further strengthens the study by providing insights into the dynamics and regulatory mechanisms of the endomembrane system. This demonstrates how the actomyosin system contributes to cellular morphology and trafficking processes. Furthermore, the discovery of TbMyo1 localization to glycosomes introduces a novel aspect to the potential roles of myosin I proteins within the cell, particularly in the context of organelles analogous to peroxisomes. This observation not only broadens our understanding of myosin I functionality but also opens up new avenues for research into the cellular biology of trypanosomatids, marking a significant contribution to the field. 

      We are very pleased that the Reviewer felt the work is a significant contribution to the state of the art.  

      Weaknesses:  

      Certain limitations inherent in the study's design and scope render the narrative incomplete and make it challenging to reach definitive conclusions. One significant limitation is the reliance on spatial association data, such as colocalization of TbMyo1 with various cellular components-or the absence thereof-to infer functional relationships. Although these data suggest potential interactions, the authors do not confirm functional or direct physical interactions.  

      While TbMyo1's localization is informative, the authors do not directly demonstrate its biochemical or mechanical activities in vivo, leaving its precise role in cellular processes speculative. Direct assays that manipulate TbMyo1 levels, activity, and/or function, coupled with observations of the outcomes on cellular processes, would provide more definitive evidence of the protein's specific roles in T. brucei. A multifaceted approach, including genetic manipulations, uptake assays, kinetic trafficking experiments, and imaging, would offer a more robust framework for understanding TbMyo1's roles. This comprehensive approach would elucidate not just the "what" and "where" of TbMyo1's function but also the "how" and "why," thereby deepening our mechanistic insights into T. brucei's biology.  

      The reviewer is absolutely correct that the study lacks data on direct or indirect interactions between TbMyo1 and its intracellular partners, and this is an obvious area for future investigation. Given the generally low affinities of motor-cargo interactions, a proximity labelling approach (such has already been successfully used in studies of other myosins) would probably be the best way to proceed. 

      The reviewer is also right to highlight that a detailed mechanistic understanding of TbMyo1 function in vivo is currently lacking. We feel that this would be beyond the scope of the present work, but have included some new data using the TbMyo1 RNAi cell line (Figure S5), which are consistent with our previous findings.  

      Reviewer #3 (Public Review):  

      Summary:  

      In this work, Link and colleagues have investigated the localization and function of the actomyosin system in the parasite Trypanosoma brucei, which represents a highly divergent and streamlined version of this important cytoskeletal pathway. Using a variety of cutting-edge methods, the authors have shown that the T. brucei Myo1 homolog is a dynamic motor that can translocate actin, suggesting that it may not function as a more passive crosslinker. Using expansion microscopy, iEM, and CLEM, the authors show that MyoI localizes to the endosomal pathway, specifically the portion tasked with internalizing and targeting cargo for degradation, not the recycling endosomes. The glycosomes also appear to be associated with MyoI, which was previously not known. An actin chromobody was employed to determine the localization of filamentous actin in cells, which was correlated with the localization of Myo1. Interestingly, the pool of actomyosin was not always closely associated with the flagellar pocket region, suggesting that portions of the endolysomal system may remain at a distance from the sole site of parasite endocytosis. Lastly, the authors used actin-perturbing drugs to show that disrupting actin causes a collapse of the endosomal system in T. brucei, which they have shown recently does not comprise distinct compartments but instead a single continuous membrane system with subdomains containing distinct Rab markers.  

      Strengths:  

      Overall, the quality of the work is extremely high. It contains a wide variety of methods, including biochemistry, biophysics, and advanced microscopy that are all well-deployed to answer the central question. The data is also well-quantitated to provide additional rigor to the results. The main premise, that actomyosin is essential for the overall structure of the T. brucei endocytic system, is well supported and is of general interest, considering how uniquely configured this pathway is in this divergent eukaryote and how important it is to the elevated rates of endocytosis that are necessary for this parasite to inhabit its host.  

      We are very pleased that the Reviewer formed such a positive impression of the work. 

      Weaknesses:  

      (1) Did the authors observe any negative effects on parasite growth or phenotypes like BigEye upon expression of the actin chromobody?  

      Excellent question! There did appear to be detrimental effects on cell morphology in some cells, and it would definitely be worth doing a time course of induction to determine how quickly chromobody levels reach their maximum. The overnight inductions used here are almost certainly excessive, and shorter induction times would be expected to minimise any detrimental effects. We have noted these points in the Discussion.  

      (2) The Garcia-Salcedo EMBO paper cited included the production of anti-actin polyclonal antibodies that appeared to work quite well. The localization pattern produced by the anti-actin polyclonals looks similar to the chromobody, with perhaps a slightly larger labeling profile that could be due to differences in imaging conditions. I feel that the anti-actin antibody labeling should be expressly mentioned in this manuscript, and perhaps could reflect differences in the F-actin vs total actin pool within cells.  

      Implemented. We have explicitly mentioned the use of the anti-actin antibody in the Garcia-Salcedo paper in the revised Results and Discussion sections.  

      (3) The authors showed that disruption of F-actin with LatA leads to disruption of the endomembrane system, which suggests that the unique configuration of this compartment in T. brucei relies on actin dynamics. What happens under conditions where endocytosis and endocyctic traffic is blocked, such as 4 C? Are there changes to the localization of the actomyosin components? 

      Another excellent question! We did not analyse the localisation of TbMyo1 and actin under temperature block conditions, but this would definitely be a key experiment to do in follow-up work.

      (4) Along these lines, the authors suggest that their LatA treatments were able to disrupt the endosomal pathway without disrupting clathrin-mediated endocytosis at the flagellar pocket. Do they believe that actin is dispensable in this process? That seems like an important point that should be stated clearly or put in greater context.  

      Whether actin plays a direct or indirect role in endocytosis would be another fascinating question for future enquiry, and we do not have the data to do more than speculate on this point. Recent work in mammalian cells (Jin et al., 2022) has suggested that actin is primarily recruited when endocytosis stalls, and it could be that a similar role is at play here. We have noted this point in the Discussion. The observation of clathrin vesicles close to the flagellar pocket membrane and clathrin patches on the flagellar pocket membrane itself in the LatA-treated cells might suggest that some endocytic activity can occur in the absence of filamentous actin. 

      Recommendations for the authors:

      Note from the Reviewing Editor:  

      During discussion, all reviewers agreed that the role of TbMyo1 in vivo in endomembrane function had not been directly demonstrated. This could be done by testing the endocytic trafficking of (for example) fluorophoreconjugated TfR and BSA in the existing Myo1 RNAi line, using wide-field microscopy. Examining the endosomes/lysosomes' organization by thin-section EM would be even better. The actin signal detected by the chromobody tends to occupy a larger region than the MyoI. It's therefore conceivable that actin filamentation and stabilization via other actin-interacting proteins create the continuous endosomal structure, while MyoI is necessary for transport or other related processes. 

      These are all excellent points and very good suggestions. We have now incorporated new data (supplemental Figure S5) that includes BSA uptake assays in the TbMyo1 RNAi cell line and electron microscopy imaging after TbMyo1 depletion – the results are consistent with our earlier observations.   

      Reviewer #1 (Recommendations For The Authors):  

      -  Figure S2E. This panel is supposed to show the downregulation of TbMyo1 in the PCF compared to BSF but there is no loading control to support this claim. This is important because the authors mention in lines 381-383 that this finding conflicts with the previous study (Spitznagel et al., 2010). The authors also indicate in the figure legend that there is 50% less signal but there is no explanation about this quantification.   

      Good point. Equal numbers of cells were loaded in each lane, but we did not have an antibody against a protein known to be expressed at the same level in both PCF and BSF cells to use as a loading control. Using a total protein stain would have been similarly unhelpful in this context, as the proteomes of PCF and BSF cells are dissimilar. The quantification was made by direct measurement after background subtraction, but without normalisation owing to the lack of a loading control. This makes the conclusion somewhat tentative, but given the large difference in signal observed between the two samples (and the fact that this is consistent with the proteomic data obtained by Tinti and Ferguson) we feel that the conclusion is valid. We have clarified these points in the figure legend and Discussion.  

      -  It is mentioned in the discussion, as unpublished observations, that the predicted FYVE motif of TbMyo1 can bind specifically PI(3)P lipids. This is a very interesting point that would be new and would strengthen the suggested association with the endosomal system mainly based on imaging data. 

      We agree that this is – potentially – a very exciting observation and it is an obvious direction for future enquiry.  

      The data are preliminary at this stage and will form the basis of a future publication. Given that the predicted FYVE domain of TbMyo1 and known lipid-binding activity of other class I myosins makes this activity not wholly unexpected, we feel that it is acceptable at this stage to highlight these preliminary findings.  

      -  The authors use the correlation coefficient to estimate the colocalization (lines 223-226). Although they clearly explain the difference between the correlation coefficient and the co-occurrence of two signals, I wonder if it would not be clearer for the audience to have quantification of the overlapping signals. Also, it is not mentioned on which images the correlation coefficient was measured. It seems that it is from widefield images (Figures 3E and 6E), and likely from SIM images for Figure 3C but the resolution is different. Are widefield images sufficient to assess these measurements? 

      With hindsight, and given the different topological locations of TbMyo1 and the cargo proteins (cytosolic and lumenal, respectively) it would probably have been wiser to measure co-occurrence rather than correlation, but we would prefer not to repeat the entire analysis at this stage. The correlations were measured from widefield images using the procedure described in the Materials & Methods. These are obviously lower resolution than confocal or SIM images would be, but are still of value, we believe. One further point – upon re-examination of some of the TbMyo1 transferrin (Tf) and BSA data, we noticed that there are many pixels with a value of 0 for Tf/BSA and a nonzero value for TbMyo1 and vice-versa. The incidence of zero-versus-nonzero values in the two channels will have lowered the correlation coefficient, and in this sense, the correlation coefficients are giving us a hint of what the immuno-EM images later confirm: that the TbMyo1 and cargo are present in the same locations, but in different proportions. We have added this point to the discussion.  

      -  It would be good to know if the loss of the endosomal system integrity (using EBI) is the same upon TbMyo1 depletion than in the latrunculin A treated parasites. 

      We agree! We have now included new data (Figure S5) that suggests endosomal system morphology is altered upon TbMyo1 depletion. We would predict that the effect upon TbMyo1 depletion is slower or less dramatic than upon LatA treatment (as LatA affects both actin and TbMyo1, given that TbMyo1 depends upon actin for its localisation).

      -  Conversely, it would be of interest to see how the localization of TbMyo1 changes upon latrunculin A treatment.

      This experiment was done in 2010 by Spitznagel et al., who observed a delocalisation of the TbMyo1 signal after LatA treatment. We have noted this in the Results and Discussion.

      Minor corrections:  

      -  Line 374: Figure S1 should be Figure S2. 

      Implemented (many thanks!).  

      -  Panel E of Figure S2 refers to TbMyo1 and should therefore be included in Figure S1 and not S2. 

      We would prefer not to implement this suggestion. We did struggle over the placing of this panel for exactly this reason, but as the samples were obtained as part of the experiments described in Figure S2, we felt that its placement here worked best in terms of the narrative of the manuscript.    

      -  Figure S2F: the population of TbMyo21 +Tet seems lost after 48 h although the authors mention that there is no growth defect. 

      Good eyes! We have re-added the panel, which shows that there was no growth defect in the tetracycline-treated population.  

      Reviewer #2 (Recommendations For The Authors):  

      Fig 1 vs. Figure 3: The biochemical fractionation experiments have been well-controlled, showing that 40% of TbMyo1 is found in both the cytosolic and cytoskeletal fractions, with only 20% in the organelle-associated fraction. The conclusion is supported by the experimental design, which includes controls to rule out crosscontamination between fractions. However, does this contrast with the widefield microscopy experiments, where the vast majority of the signal is in endocytic compartments and nowhere else? 

      This is a good point. There are three factors that probably explain this. First, given that the actin cytoskeleton is associated with the endosomal system, a large proportion of the material partitioning into the cytoskeleton (P2) fraction is probably localised to the endosomal system (a fun experiment would be to repeat the fractionation with addition of ATP to the extraction buffer to make the myosin dissociate and see whether more appeared in the SN2 fraction as a result). Second, the 40% of the TbMyo1 that is cytosolic is distributed throughout the entire cellular volume, whereas the material localised to the endosomes is concentrated in a much smaller space, by comparison, and producing a stronger signal. Third, the widefield microscopy images have had brightness and contrast adjusted in order to reduce “background” signal, though this will also include cytosolic molecules. We hope these explanations are satisfactory, but would welcome any additional thoughts from either the reviewer or the community.  

      The section title 'TbMyo1 translocates filamentous actin at 130 nm/s' could mislead readers by not specifying that the findings are from an in vitro experiment with a recombinant protein, which may not fully reflect the cell's complex context. Although this detail is noted in the figure legend, incorporating it into the main text and considering a title revision would ensure clarity and accuracy.  

      Good point. Implemented – we have amended the section title to “TbMyo1 translocates filamentous actin at 130 nm/s in vitro” and the figure legend title to “TbMyo1 translocates filamentous actin in vitro”.  

      The discussion of the translocation experiment could be better phrased addressing certain limitations. The in vitro conditions might not fully capture the complexity and dynamic nature of cellular environments where multiple regulatory mechanisms, interacting partners, and cellular compartments come into play. 

      Good point, implemented. We have added a note on this to the Discussion.  

      It is puzzling that RNAi, which is widely used in T. brucei was not used to further investigate the functional roles of TbMyo1 in Trypanosoma brucei. Given that the authors already had the cell line and used it to validate the specificity of the anti-TbMyo1. RNAi could have been employed to knock down TbMyo1 expression and observe the resultant effects on actin filament dynamics and organization within the cell. This would have directly tested TbMyo1's contribution to actin translocation observed in the in vitro experiments. 

      It would obviously be interesting to carry out an in-depth characterisation of the phenotype following TbMyo1 depletion and whether this has an effect on actin dynamics. We have now included additional data (supplemental Figure S5) using the TbMyo1 RNAi cells and the results are consistent with our earlier observations and interpretations. It is worth noting too that at least for electron microscopy studies of intracellular morphology, the slower onset of an RNAi phenotype and the asynchronous replication of T. brucei populations make observation of direct (early) effects of depletion challenging – hence the preferential use of LatA here to depolymerise actin and trigger a faster phenotype.  

      I found that several declarative statements within the main text may not be fully supported by the overall evidence. I suggest modifications to present a more balanced view,  

      Line 227: "The results here suggest that although the TbMyo1 distribution overlaps with that of endocytic cargo, the signals are not strongly correlated." This conclusion about the lack of strong correlation might mislead readers about the functional relationship between TbMyo1 and endocytic cargo, as colocalization does not directly imply functional interaction. 

      We would prefer not to alter this statement. It was our intention to phrase this cautiously, as we have not directly investigated the functional interplay between TbMyo1 and endocytic cargo and the subsequent sentence directs the reader to the Discussion for more consideration of this issue.    

      Line 397: "This relatively high velocity might indicate that TbMyo1 is participating in intracellular trafficking of BSF T. brucei and functioning as an active motor rather than a static tether." The statement directly infers TbMyo1's functional role from in vitro motility assay velocities without in vivo corroboration.

      We have amended the sentence in the Discussion to make it clear that it is speculative.  

      The hypothesis that cytosolic TbMyo1 adopts an auto-inhibited "foldback" configuration, drawn by analogy with findings from other studies, is intriguing. Yet, direct evidence linking this configuration to TbMyo1's function in T. brucei is absent from the data presented. 

      We have amended the sentence in the Discussion to make it clear that it is speculative. Future in vitro experiments will test this hypothesis directly.  

      The suggestion that a large cytosolic fraction of TbMyo1 indicates dynamic behavior, high turnover on organelles, and a low duty ratio is plausible but remains speculative without direct experimental evidence. Measurements of TbMyo1 turnover rates or duty ratios in T. brucei through kinetic studies would substantiate this claim with the necessary evidence.  

      We have amended the sentence in the Discussion to make it clear that it is speculative, and deleted the reference to a possible low duty ratio. Again, future in vitro experiments will measure the duty ratio of TbMyo1 using stopped-flow. 

      Reviewer #3 (Recommendations For The Authors):  

      Lines 171-172: The authors mention that MyoI could be functioning as a motor rather than a tether. The differences in myosin function have not been introduced prior to this. I would recommend explaining these differences and what it could mean for the function of the motor in the introduction to help a non-expert audience.

      Good point. Implemented.  

      Line 94-95: This phenotype only holds for the bloodstream form- the procyclic form are quite resistant to actin RNAi and MyoI RNAi. I would clarify. 

      Good point. Implemented.  

      Line 142-146: did the authors attempt to knock out the Myo21? 

      Good point. No, this was not attempted. Given the extremely low expression levels of TbMyo21 in the BSF cells we would not expect a strong phenotype, but this assumption would be worth testing. 

      Figure 3D: is there a reason why the authors chose to show the single-channel images in monochrome in this case?  

      Not especially. These panels are the only ones that show a significant overlap in the signals between the two channels (unlike the colabelling experiments with ER, Golgi), so greyscale images were used because of their higher contrast. 

      Line 397-398: I'm struggling a bit to understand how MyoI could be involved in intracellular trafficking in the endosomal compartments if the idea is that we have a continuous membrane? Some more detail as to the author's thinking here would be useful. 

      Implemented. We have noted that this statement is speculative, and emphasised that being an active motor does not automatically mean that it is involved in intracellular traffic – it could instead be involved in manipulating endosomal membranes. We have noted too that the close proximity between TbMyo1 and the lysosome (Figures

      3-5) could be important in this regard. The lysosome is not contiguous with the endosomal system, and it is possible that TbMyo1 is working as a motor to transport material (class II clathrin-coated vesicles) from the endosomal system to the lysosome.  

      Line 493-496: Does this mean that endocytosis from the FP does not require actin? This would be hard to explain considering the phenotypes observed in the original actin RNAi work. Is the BigEye phentopye observed in BSF actin RNAi and Myo1 RNAi cells due to some indirect effect? 

      It seems possible that actin is not directly or essentially involved in endocytosis, and the characterisation of the actin RNAi phenotype would be worth revisiting in this respect – we have noted this in the Discussion. Although RNAi of actin was lethal, the phenotype appears less penetrant than that seen following depletion of the essential endocytic cofactor clathrin (based on the descriptions in Garcia-Salcedo et al., 2004 and Allen et al., 2003). BigEye phenotypes occur in BSF cells whenever there is some perturbation of endomembrane trafficking and are not necessarily a direct consequence of depletion – this is why careful investigation of early timepoints following RNAi induction is critical.

    1. Author response:

      We are very appreciative of the reviewers’ assessment that we used “solid and creative” methods to provide a “convincing demonstration” of “compelling theoretical results” on a “crucial but less-explored issue” in cognitive neuroscience. We are also grateful for their thoughtful suggestions for analyses and for pointing out areas where our analysis descriptions need more clarity. While we will respond to all comments in a future response and revision, here we provide information and clarification on a few central points.

      Localization of semantic content:

      Regarding our semantic analysis, one reviewer rightly pointed out that items with a high degree of semantic association, as captured by word2vec, tend to occur in the same images, and they expressed concern that this could drive our similarity results. We wish to clarify here (and will revise the manuscript accordingly) that we excluded all pairs of co-occurring items in our word2vec semantic analysis in order to avoid this issue. Thus, our results cannot be driven by the number of images within which items co-occurred. We also agree with the reviewer who stated that “semantic information” is a nebulous term in the cognitive neurosciences, and it appears to have led to some confusion as to the nature of our claims. We take a broad view of this term, with the perspective that visual features (e.g., color, shape) can contribute to semantic content rather than necessarily competing with it. In our work, we use word2vec to identify neural representations that reflect the kind of semantic content present in word embedding models—but the conclusions we draw do not depend on these representations being devoid of visual content. That is, we do not use word2vec to examine semantic versus visual representations, but rather to narrow down the set of representations to be considered in subsequent analyses. While there are a range of legitimate views on what should be considered a “semantic” representation, our broad view, which is inclusive of visual content, along with our strategy for localizing semantic content are both standardly used in the visual neuroscience literature. Prior work in this literature has compared the ability of word2vec and low-level visual models to predict neural responses to natural images and found that the brain regions in which activity is accurately predicted by the models are considerably distinct: whereas a low-level visual model best predicts activity in V1, V2, and V4, word2vec performs better in more anterior regions, including in visual areas such as lateral occipital cortex (Güçlü & van Gerven, 2015, arXiv). This suggests that our effects are unlikely to be explained by overlap in the kinds of low-level visual features mentioned by the reviewers. However, the semantic content we localize and the representation of high-level visual features may indeed overlap, and this is compatible with our claims. We will do more in our revision to be explicit about our intended meaning in our use of the word “semantic” and how our approach relates to and builds on prior work in this literature.

      Long-term representational drift:

      We want to clarify our claims regarding the representational drift analysis. One reviewer stated that, while we show evidence of representational drift, we “provide no evidence suggesting that this long-term neural representational drift reflects a drift in semantic representation.” Another reviewer said: “The inference is that this [drift] is due to an updating of knowledge about the associations each item has had with other items,” and that our finding that semantic structure remains stable within these regions seems “to contradict the claims about semantic plasticity.” The claim we intended to make, which will be unpacked more clearly in our revision, is that the neural representations underlying semantic content drift over time, even if the semantic content itself is unchanging. In other words, we do not claim that our across-session drift analyses show changes in knowledge about object associations. Indeed, one of the reasons that representational drift has recently captured the attention of neuroscientists is that the neural representations underlying certain behaviors or cognitive content appear to drift over time even when the behaviors or cognitive content remain fixed. The relational structure of the neural representations can remain stable, even if the particular neurons recruited to represent each stimulus change over time (see, e.g., the T-maze in Rule, O’Leary, & Harvey., 2019, Curr Opin Neurobiol). Here we are translating these ideas, which were developed using animal models and/or primarily focused on low-level vision, to the semantic system in humans. The neural representations we identify in our paper capture semantic information because they share a similarity structure with word2vec, and the level of similarity to word2vec remains stable over time. Thus, our findings provide a simple demonstration of long-term representational drift in the human semantic system akin to that reported in animals—drift in the neural semantic representations of items even as the relations between these item representations appear stable.

      Signal-to-noise variability across the MTL:

      A reviewer raised the possibility that differences between our ROIs could be driven by variability in signal-to-noise ratio (SNR) across regions, particularly within the medial temporal lobe (MTL). We looked at noise ceiling SNR brain maps for each participant, which reflect the reliability of neural responses across repetitions of the same image. Preliminary analyses indicate that SNR differences do not account for our object encoding, semantic content, representational drift, or short-term plasticity measures across the MTL.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      This research used cell-based signaling assay and Gaussian-accelerated molecular dynamics (GaMD) to study peptide-mediated signaling activation of Polycystin-1 (PC1), which is responsible for the majority of autosomal dominant polycystic kidney disease (ADPKD) cases. Synthetic peptides of various lengths derived from the N-terminal portion of the PC1 C-terminal fragment (CTF) were applied to HEK293T cells transfected with stalkless mouse CTF expression construct. It was shown that peptides including the first 7, 9, and 17 residues of the N-terminal portion could activate signaling to the NFAT reporter. To further understand the underlying mechanism, docking and peptide-GaMD simulations of peptides composed of the first 9, 17, and 21 residues from the N-terminal portion of the human PC1 CTF were performed. These simulations revealed the correlation between peptide-CTF binding and PC1 CTF activation characterized by the close contact (salt bridge interaction) between residues R3848 and E4078. Finally, a Potts statistical model was inferred from diverged PC1 homologs to identify strong/conserved interacting pairs within PC1 CTF, some of which are highly relevant to the findings from the peptide GaMD simulations. The peptide binding pockets identified in the GaMD simulations may serve as novel targets for the design of therapeutic approaches for treating ADPKD.

      We greatly appreciate the reviewer’s encouraging and positive comments. The reviewer’ specific comments are addressed pointwise below and changes to the text will be highlighted in yellow in the revised manuscript.

      (1) The GaMD simulations all include exogenous peptides, thus lacking a control where no such peptide is present (and only stalkless CTF). An earlier study (PNAS 2022 Vol. 119 No. 19 e2113786119) covered this already, but it should be mentioned here that there was no observation of close/activation for the stalkless CTF.

      We appreciate the reviewer’s concern about the lack of a control where no exogenous peptide is present. As suggested by the reviewer, we are adding more details about the study on the stalkless CTF as a control in the Introduction of the revised manuscript. 

      (2) Although 5 independent trajectories were generated for each peptide, the authors did not provide sufficient details regarding the convergence of the simulation. This leaves some uncertainties in their results. Given that the binding poses changed relative to the starting docked poses for all three peptides, it is possible that some other binding pockets and/or poses were not explored.

      We appreciate the reviewer’s comment regarding the convergence of the simulation results. This is clarified in the revised manuscript as: 

      “We have calculated free energy profiles of individual simulations for each system, including the p9, p17, and p21, as shown below (Figs. S5, S6 and S8). For the p9 peptide, the “Bound” lowenergy state was consistently identified in the 2D free energy profile of each individual simulation (Fig. S5). For the p17 peptide, Pep-GaMD simulations were able to refine the peptide conformation from the "Unbound” to the "Intermediate” and “Bound” states in Sim1 and Sim5, while the peptide reached only the "Intermediate” state in the other three simulations (Fig. S6). For the p21 peptide, Pep-GaMD was able to refine the peptide docking conformation to the

      "Bound” state in all the five individual simulations (Fig. S8).”

      “It is important to note that the free energy profiles calculated from GaMD simulations of PC1 CTF were not fully converged since certain variations were observed among the individual simulations. Nevertheless, these calculations allowed us to identify representative low-energy binding conformations of the peptides.”

      (3) The free energy profiles (Figures 2 to 4) based on the selected coordinates provide important information regarding binding and CTF conformational change. However, it is a coarsegrained representation and complementary analysis such as RDFs, and/or contact maps between the peptide and CTF residues might be helpful to understand the details of their interactions. These details are currently only available in the text. 

      Following the reviewer's suggestion, we have now included a set of protein contact maps showing contacts between the peptides and the TOP domain for each peptide in the representative "Bound” state in revised Supplementary Information (Fig. S4). The contact maps serve to visualize the list of contacts mentioned in the main text. This will be clarified in the revised manuscript.

      (4) The use of a stalkless CTF is necessary for studying the functions of the exogenous peptides. However, the biological relevance of the stalkless CTF to ADPKD was not clearly explained, if any.

      We appreciate the reviewer’s comment. As correctly assessed by the reviewer, the stalkless CTF is not a biological form of PC1 observed in ADPKD, but rather was used as the simplest or least complex system in which the activities and binding of exogenous peptides could be studied. However, in ADPKD, there are numerous missense mutations reported within the GPCR autoproteolysis-inducing (GAIN) domain that have been shown to prevent or inhibit cleavage at the GPCR-coupled proteolysis site (GPS). Loss of PC1 GPS cleavage, which is known to cause ADPKD, would retain or sequester the stalk tethered agonist within the interior of the GAIN domain, which would presumably interfere with interactions between stalk tethered agonist residues and the remainder of the CTF. Furthermore, there are 10 single nucleotide polymorphisms reported within the stalk sequence (ADPKD Variant Database; https://pkdb.mayo.edu/welcome), most of which we have found to significantly reduce CTF-mediated activation of the NFAT reporter (Magenheimer BS, et al., Constitutive signaling by the C-terminal fragment of polycystin1 is mediated by a tethered peptide agonist; bioRxiv 2021.08.05.455255). In particular, the ADPKD-associated G3052R stalk mutation that was analyzed along with the stalkless CTF by GaMD simulations (Pawnikar et al, PNAS, 2022) has the same reduction in activity as the stalkless CTF in the cellular signaling reporter assays and the same loss of closed conformation interactions in GaMD analyses. As such, we believe the stalkless CTF has biological relevance from the aspect that it mimics the deficiency in signaling activation observed for PC1 CTF stalk mutants. This is clarified in the revised manuscript in the Introduction, page 5, “constructs encoding a stalkless PC1 CTF (a nonbiological mutant of PC1 with deletion of the first 21 N-terminal residues of CTF) and three ADPKD-associated…”) and near the beginning of the Discussion, page 16, where the biological relevance of studying the stalkless CTF is explained

      (5) The authors might want to clarify if a stalkless CTF is commonly seen in ADPKD, or if it is just a construct used for this study.

      The stalkless CTF is not a biological form of PC1, but rather a construct used for this study. This was clarified in the revised manuscript (see response above).

      (6) (Pages 7-8) "...we generated expression constructs of mouse (m) PC1 consisting of the CD5 signal peptide sequence fused in frame with the stalk sequence of mCTF ...". What is the CD5 signal peptide sequence here? What is its use?

      The CD5 signal peptide sequence is “MPMGSLQPLATLYLLGMLVASVLG” from the T cell surface glycoprotein, CD5. Since the N-terminus of PC1 CTF is derived from a posttranslational, autocatalytic, endoproteolytic cleavage event, this isoform is already membraneembedded and therefore lacks its endogenous signal peptide. The CD5 signal peptide coding sequence is added to the PC1 CTF expression constructs in order to ensure translation and insertion of the encoded protein at the endoplasmic reticulum. Additional details were added to the Experimental Procedures, page 2 of Supporting Information.

      (7) (Page 8) "All peptides were appended with a C-terminal, 7-residue hydrophilic sequence (GGKKKKK) to increase solubility". How did the authors make sure that this sequence has no influence on the signaling? 

      To determine the possible effect of the hydrophilic GGKKKKK sequence on signaling, we had a ‘solubility tag’ peptide (LGGKKKKK) synthesized and purified by GenScript. It was necessary to add an N-terminal Leu residue to the 7-residue hydrophilic tag sequence in order for the highly hydrophilic peptide to be recovered. Effect of treatment with the solubility tag peptide on activation of the NFAT reporter was assessed for both empty vector- and ∆stalkCTF-transfected cells in 3 separate signaling experiments (see figure below). Each experiment also included a negative control treatment (no peptide/culture medium only addition) and a positive control treatment (stalk peptide p17). The p17 peptide we had available was derived from the stalk sequence of human PC1 that differs from the mouse PC1 sequence at residues 15 and 17, which are two poorly conserved positions within the stalk sequence (see Reviewer 2, Response 3). In the first experiment with the solubility tag and human p17 peptides (B in figure below), we inadvertently used the empty expression vector and ∆stalkCTF expression construct from mouse PC1. After realizing our error, we then performed 2 additional signaling experiments (C and D in figure below) with the ‘correct’ human ∆stalkCTF expression construct and empty vector. In the revised manuscript, we have provided the results from each of the 3 experiments as Fig. S2 (below).

      (8) (Page 9) "Using a computational model of the ΔStalk PC1 CTF developed previously". The authors might want to expand here a little to give a short review about the structure preparation.

      We appreciate the reviewer’s suggestion regarding the addition of details for structure preparation for Stalkless CTF. We have added these details in section “Docking and Pep-GaMD simulations of peptide agonist binding to stalkless PC1 CTF” on Page 10 in the revised manuscript:  “The cryo-EM structure of human PC1-PC2 complex (PDB: 6A70) was used to build the computational model for WT PC1 CTF. As the protein had several missing regions including the Stalk and several loops, homology modeling of the missing regions was done using I-TASSER web server. Using the WT PC1 CTF model, computational model for ΔStalk was generated by deleting the first 21 residues (3049-3069) of the WT PC1 and using the structure for stalkless CTF, we successfully docked the p9, p17 and p21 stalk peptides with HPEPDOCK.  The peptides all bound to the TOP domain and the interface between the TOP domain and extracellular loop 1 (ECL1) of CTF.”

      (9) How was "contact" defined when counting the number of contacts used in the 2D PMFs (Figures 2-4). Response: We appreciate the reviewer’s comment regarding the definition of the number of contacts used in the 2D PMFs. This has been clarified in the revised manuscript as: “The number of contacts is calculated between any atom pairs within 4 Å distance of the peptide and extracellular domains of PC1 protein.”

      (10) How was the ranking of GaMD clusters done? It looks from Figure 3A that the "intermediate" state is more favorable compared to the "bound" state, but it was claimed in the text the "bound" state was ranked 1st. 

      Thanks to the reviewer for this comment. It has been clarified in the revised

      Supplementary Information: “Three independent Pep-GaMD simulations were combined to perform structural clustering using the hierarchical agglomerative clustering algorithm in CPPTRAJ. A 3 Å RMSD cutoff was used for each peptide system. PyReweighting was then applied to calculate the original free energy values of each peptide structural cluster with a cutoff of 500 frames. The structural clusters were finally ranked according to the reweighted free energy values.” And in the revised main text: “It is important to note that the free energy profiles calculated from GaMD simulations of PC1 CTF were not fully converged since certain variations were observed among the individual simulations. The free energy values of 2D PMF minima shown in Figure 3A could differ from those in the 1D PMF minima of peptide structural clusters, especially with the usage of distinct reaction coordinates. Nevertheless, these calculations allowed us to identify representative low-energy binding conformations of the peptides.”

      (11) When mentioning residue pair distances, such as in the sentence "The distance between the TOP domain residue R3848 and PL residue E4078 was 3.8 Å (Fig. 4D)" on page 12, it should be clarified if these distances are average distance, or a statistical error can be given.

      We appreciate the reviewer’s comment regarding the TOP Domain and PL distance between residues R3848-E4078. This has been clarified on page 14 in the revised manuscript as:

      “The distance between the TOP domain residue R3848 and PL residue E4078 was 3.8 Å. The distance was extracted from the top-ranked structural cluster of the p21 bound to the ΔStalk CTF, corresponding to the “Closed/Active” low-energy conformational state. (Fig. 4E)”.

      (12) More analysis of the GaMD can be performed. For example, the authors observed a single "bound" state for p21, but there must be some flexibility in the peptide and the protein itself. The authors might want to consider adding some plots illustrating the flexibility of the peptide residues (for example, a RMSD plot). Contact maps can also be added to visualize the results currently discussed in the text. 

      We thank the reviewer for their constructive suggestions. To characterize flexibility of the peptide and protein in the revised manuscript, we have added plots of the TOP-PL interaction distance between residues R3848-E4078 in PC1, the radius of gyration (Rg) of p21 and root-mean square deviation (RMSD) of p21 relative to the starting HPEPDOCK conformation of the peptide in the new Fig. S7. The peptide-protein contact map has also been added in the new Fig. S4.

      (13) (Page 7) In the sentence `...sampled the "Closed/Active" low-energy state relative to the large number of Stalk-TOP contacts`, I suggest using "related to" instead of "relative to"

      We thank the reviewer for the comment, and we have replaced "relative to" to “related to” in the following sentence `...sampled the "Closed/Active" low-energy state relative to the large number of Stalk-TOP contacts`

      (14) (Page 7) In the sentence `Our previous study utilized expression constructs of human PC1 CTF, however, in order to prepare for ...`, "PC1 CTF, however," -> "PC1 CTF. However,"

      We thank the reviewer for the comment, and we have replaced "PC1 CTF, however," to "PC1 CTF. However," in the following sentence `Our previous study utilized expression constructs of human PC1 CTF, however, in order to prepare for ...`.

      Reviewer 2:

      The autosomal dominant polycystic kidney disease (ADPKD) is a major form of polycystic kidney disease (PKD). To provide better treatment and avoid side effects associated with currently available options, the authors investigated an interesting GPCR, polycystin-1 (PC1), as a potential therapeutic target. In vitro and in silico studies were combined to identify peptide agonists for PC1 and to elucidate their roles in PC1 signaling. Overall, regarding the significance of the findings, this work described valuable peptide agonists for PC1 and the combined in vitro and in silico approach can be useful to study a complex system like PC1. However, the strength of the evidence is incomplete, as more experiments are needed as controls to validate the computational observations. The work appears premature.

      We greatly appreciate the reviewer’s encouraging and positive comments. The reviewer’ specific comments are addressed pointwise below and changes to the text will be highlighted in yellow in the revised manuscript.

      (1) The therapeutic potential of PC1 peptide agonists is unclear in the introduction. For example, while the FDA-approved drug Jynarque was mentioned, the text was misleading as it sounded like Jynarque targeted PC1. In fact, it targets another GPCR, the vasopressin receptor 2 (V2). A clear comparison of targeting PC1 over V2 pathways and their therapeutic relevance can help the readers better understand the importance of this work. Importantly, a clear background on the relationship between PC1 agonism and treatments for ADPKD is necessary.

      We understand the confusion that was caused by the brevity of our introductory paragraph and will clarify the differences in therapeutic targeting between Jynarque and our PC1 stalk-derived peptides in the revised manuscript. We will also expound on the rationale for targeting PC1 agonism as a therapeutic approach for ADPKD versus Jynarque. For example: It is known that ADPKD disease severity is dependent on the functional levels of PC1. Jynarque is a small molecule antagonist of the arginine vasopressin receptor 2, V2R, whose signaling, and production of cAMP has been shown to be increased in ADPKD. As this drug targets one of the downstream aberrant pathways, it is only capable of slowing disease progression and has numerous undesirable side effects. We reasoned that a therapeutic agent capable of stimulating and thus augmenting PC1 signaling function would be a safer, cyst initiation-proximal treatment capable of preventing cyst formation with few side effects.

      (2) PC1 is a complex membrane protein, and most figures focus on the peptide-binding site. For general readers (or readers that did not read the previous PNAS publication), it is hard to imagine the overall structure and understand where the key interactions (e.g., R3848-E4078) are in the protein and how peptide binding affects locally and globally. I suggest enhancing the illustrations.

      We thank the reviewer for the constructive comment on adding more illustrations for the PC1 protein to understand the overall structure and the location of the key interaction R3848E4078. We have included these suggestions and modified the main figures in the revised manuscript.  

      (3) The authors used the mouse construct for the cellular assays and the peptide designs in preparation for future in vivo assays. This is helpful in understanding biology, but the relevance of drug discovery is weakened. Related to Point 1, the therapeutic potential of PC1 peptide agonist is largely missing.

      The therapeutic potential of a PC1 peptide agonist is addressed in response #1 above. As mentioned in the manuscript and recognized by the reviewer, the cellular signaling assays were performed with the mouse PC1 CTF expression construct and with peptides based on the mouse PC1 stalk sequence for future, pre-clinical studies, while the peptide binding studies were performed with the human PC1 stalk sequence. We feel the relevance for drug discovery is not significantly weakened for a number of reasons: 1) as shown in Fig. 1A, the stalk sequence is highly conserved between mouse and human PC1, specifically there are only 2 residue differences present within peptides p17 and p21. One of the differences is a ‘semi-conservative’ Gln-Arg substitution at peptide residue 15, while the second difference is a conservative Ile-Val substitution at peptide residue 17; 2) we have found that an Arg to Cys mutation within the mouse PC1 CTF stalk has the same effect on signaling as the corresponding human Gln to Cys ADPKD-associated mutation which was analyzed in Pawnikar et al., 2022; and 3) both peptide residues 15 and 17 represent highly variable positions within the PC1 stalk as shown in the sequence logo (below) of the stalk sequence from 16 vertebrate species; and 4) while addressing the potential effect of the hydrophilic solubility tag on stalk peptide-mediated rescue of CTF∆stalk signaling (see Reviewer 1 comments, point #7), we utilized the ‘human’ version of p17 as a positive control and tested its activation with both mouse and human CTF∆stalk expression constructs and found that human p17 peptide was also capable of stimulating the mouse CTF∆stalk protein (Fig. S2).

      Author response image 1.

      (4) More control experiments are needed. For example, a 7-residue hydrophilic sequence (GGKKKKK) is attached to the peptide design to increase solubility. This 7-residue peptide should be tested for PC1 activation as a control. Second, there is no justification for why the peptide design must begin with residue T3041. Can other segments of the stalk also be agonists?

      As mentioned above for Reviewer 1, the hydrophilic peptide has been synthesized and tested for activation of signaling by the stalkless CTF in the revised manuscript as Fig. S2. The design of peptides that begin with residue T3041 of mouse PC1 CTF is modeled on numerous similar studies for the family of adhesion GPCRs. Optimization of the binding and activity of the PC1 peptide agonist will be investigated in future studies and could include such parameters as whether the peptide must include the first residue and whether subsegments of the stalk are also agonists, however, we feel these questions are beyond the scope of this initial report.

      (5) There are some major concerns about the simulations: The GaMD simulations showed different binding sites of p-21, p-17, and p-9, and the results report the simulated conformations as "active conformational states". However, these are only computational findings without structural biology or mutagenesis data to validate. Further, neither docking nor the simulation data can explain the peptide SAR. Finally, it will be interesting if the authors can use docking or GaMD and explain why some peptide designs (like P11-P15) are less active (as control simulations).

      The reviewer brings up an important observation regarding differences in binding sites between peptides p9, p17 and p21. We will include discussion of this observation and our interpretations to the revised manuscript. While the present study is focused on identification of initial peptides that are able to activate the PC1 CTF, we shall include further mutation experiments and simulations, peptide SAR and optimization of the lead peptides in future studies. This has been clarified in the revised manuscript.

      (6) Additional experiments for the controls and for validating the simulations. Additional simulations to explain the SAR.

      We appreciate the reviewer’s comment for additional experiments for the controls and additional simulations to explain the SAR. For future studies, we shall include further mutation experiments and simulations, peptide SAR and optimization of the lead peptides.

      (7) What is the selectivity of the peptides between PC1 and PC2?

      We have not tested the selectivity of the peptides for PC1 versus PC2 primarily because transfection of PC2 does not activate the NFAT reporter. However, it is possible that co-transfection of PC2 with the PC1 CTF could alter stalk peptide binding. This will be important to consider in future studies.

      Reviewer 3:

      The authors demonstrate the activation of Polycystin-1 (PC1), a G-protein coupled receptor, using small peptides derived from its original agonist, the stalk TA protein. In the experimental part of the study, the authors performed cellular assays to check the peptide-induced reactivation of a mutant form of PC1 which does not contain the stalk agonist. The experimental data is supported by computational studies using state-of-the-art Gaussian accelerated Molecular Dynamics (GaMD) and bioinformatics analysis based on sequence covariance. The computer simulations revealed the mechanistic details of the binding of the said peptides with the mutant PC1 protein and discovered different bound, unbound, and intermediate conformations depending on the peptide size and sequence. The use of reliable and well-established molecular simulation algorithms and the physiological relevance of this protein autosomal dominant polycystic kidney disease (ADPKD) make this work particularly valuable.

      We greatly appreciate the reviewer’s encouraging and positive comments. The reviewer’ specific comments are addressed pointwise below and changes to the text will be highlighted in yellow in the revised manuscript.

      (1) No control has been used for the computational (GaMD) study as the authors only report the free energy surface for 3 highly agonistic peptides but for none of the other peptides that did not induce an agonistic effect. Therefore, in the current version, the reliability of the computational results is not foolproof.

      We appreciate the reviewer’s concern about the lack of control with the other peptides that did not induce an agonistic effect. To address the reviewer’s concern, we have included more details on the study of the stalkless CTF and the solubility tag peptide (Fig. S2) as controls in the revised manuscript.

      (2) All discussions about the residue level interactions focused only on geometric aspects (distance, angle, etc) but not the thermodynamic aspect (e.g. residue-wise interaction energy). Considering they perform a biased simulation; the lack of interaction energy analysis only provides a qualitative picture of the mechanism.

      As mentioned by the reviewer, we have added MM/PBSA analysis results in the revised manuscript and SI.

      Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) analysis was performed to calculate the binding free energies of peptides p9, p17 and p21 to PC1 CTF. The analysis was performed using the trajectory in which the peptide was bound to the receptor. In MM/PBSA, the binding free energy of the ligand (L) to the receptor (R) to form the complex (RL) is calculated as:

      where GRL is the Gibbs free energy of the complex RL, GR is the Gibbs free energy of the molecule R in its unbound state and GL is the Gibbs free energy of the molecule L in its unbound state, respectively. 

      𝛥𝐺𝑏𝑖𝑛𝑑 can be divided into contributions of different interactions as:

      in which

      where ΔEMM , ΔGsol , 𝞓H and −TΔS are the changes in the gas-phase molecular mechanics (MM) energy, solvation free energy, enthalpy and conformational entropy upon ligand binding, respectively. ΔEMM includes the changes in the internal energies ΔEint (bond, angle and dihedral energies), electrostatic energies ΔEelec , and the van der Waals energies ΔEvdW. ΔGsol is the sum of the electrostatic solvation energy ΔGPB/GB (polar contribution) and the nonpolar contribution ΔGSA between the solute and the continuum solvent. The polar contribution is calculated using either the Poisson Boltzmann (PB) or Generalized Born (GB) model, while the nonpolar energy is usually estimated using the solvent-accessible surface area (SASA) where 𝞬 is surface tension coefficient and b is the constant offset. The change in conformational entropy −TΔS is usually calculated by normal-mode analysis on a set of conformational snapshots taken from MD simulations. However, due to the large computational cost, changes in the conformational entropy are usually neglected as we were concerned more on relative binding free energies of the similar peptide ligands.

      MM/PBSA analysis was performed using the gmx_MMPBSA software with the following command line:

      gmx_MMPBSA -O -i mmpbsa.in -cs com.tpr -ci index.ndx -cg 1 13 -ct com_traj.xtc -cp topol.top -o FINAL_RESULTS_MMPBSA.dat -eo FINAL_RESULTS_MMPBSA.csv Input file for running MM/PBSA analysis:

      &general

      sys_name="Prot-Pep-CHARMM",

      startframe=1, endframe=200, # In gmx_MMPBSA v1.5.0 we have added a new PB radii set named charmm_radii. 

      # This radii set should be used only with systems prepared with CHARMM force fields. 

      # Uncomment the line below to use charmm_radii set

      # PBRadii=7,

      /

      &pb

      # radiopt=0 is recommended which means using radii from the prmtop file for both the PB calculation and for the NP

      # calculation

      istrng=0.15, fillratio=4.0, radiopt=0

      The relative rank of the overall peptide binding free energies (Table S1) was consistent with the experimental signaling data, i.e., p21>p9>p17, for which p21 showed the largest binding free energy value of binding (-40.29±6.94 kcal/mol).

      (3) It is not mentioned clearly whether the reader should interpret the free energy landscapes quantitatively or qualitatively. Considering no error analysis or convergence plots are reported for the GaMD free energy surfaces, it may be assumed the results are qualitative. The readers should consider this caveat and not try to quantitatively reproduce these free energy landscapes with other comparable techniques.

      We appreciate the reviewer’s comment whether the free energy landscapes should be interpreted quantitatively or qualitatively. The presented free energy landscapes could be considered semi-quantitative since the simulations are not fully converged. This will be clarified in the revised manuscript as: “It is important to note that the free energy profiles calculated from GaMD simulations of PC1 CTF were not fully converged since certain variations were observed among the individual simulations. Nevertheless, these calculations allowed us to identify representative low-energy binding conformations of the peptides.”

      (4) Energy decomposition analysis similar to the following paper (https://pubs.acs.org/doi/10.1021/bi201856m) should be provided to understand the residue level enthalpic contribution in the peptide-protein interaction.

      As mentioned by the reviewer, we have performed residue-wise interaction energy analysis and included the analysis results in the revised manuscript and SI.

      Residue-wise interaction energy analysis was performed on peptides p9, p17 and p21 using the trajectory in which the peptide was bound to the PC1 CTF using the gmx_MMPBSA software with the following command line:

      gmx_MMPBSA -O -i mmpbsa.in -cs com.tpr -ct com_traj.xtc -ci index.ndx -cg 3 4 -cp topol.top -o FINAL_RESULTS_MMPBSA.dat -eo FINAL_RESULTS_MMPBSA.csv -do FINAL_DECOMP_MMPBSA.dat -deo FINAL_DECOMP_MMPBSA.csv

      Input file for running residue-wise energy decomposition analysis:

      &general

      sys_name="Decomposition", startframe=1, endframe=200,

      # forcefields="leaprc.protein.ff14SB"

      /

      &gb

      igb=5, saltcon=0.150,

      /

      # make sure to include at least one residue from both the receptor #and peptide in the print_res mask of the &decomp section.

      # this requirement is automatically fulfilled when using the within keyword.

      # http://archive.ambermd.org/201308/0075.html

      &decomp

      idecomp=2, dec_verbose=3, print_res="A/854-862 A/1-853”,

      /

      Residue-wise energy decomposition analysis allowed us to identify key residues that contributed the most to the peptide binding energies. These included residues T1 and V9 in p9 (Table S2), residues T1, R15 and V17 in p17 (Table S3), and residues P10, P11, P19 and P21 in p21 and residue W3726 in the PC1 CTF (Table S4). The energetic contributions of these residues apparently correlated to the sequence coevolution predicted from the Potts model.

      (5) To showcase the reliability of the computational approach, the authors should perform the MD simulation studies with one peptide that did not show any significant agonistic effect in the experiment. This will work as a control for the computational protocol and will demonstrate the utility of the pep-GaMD simulation in this work.

      We appreciate the reviewer’s concern about the lack of control with the other peptides that did not induce an agonistic effect. It is difficult for us to add more MD simulations on the other peptides, due to student leave after PhD graduation. But to address the reviewer’s concern, we have included more details on the study of the stalkless CTF as a control in the revised manuscript.

      (6) To assess the accuracy of the computational results the authors should mention (either in the main text or SI) whether the reported free energy surfaces were the average of the five simulations or computed from one simulation. In the latter case, free energy surfaces computed from the other four simulations should be provided in the SI. In addition, how many binding unbinding events have been observed in each simulation should be mentioned.

      We appreciate the reviewer’s comment regarding convergence of the simulation free energy surfaces. In response to Reviewer 1, we have calculated free energy profiles of individual simulations for each system, including the p9, p17, and p21 (Figs. S5, S6 and S8). 

      “We have calculated free energy profiles of individual simulations for each system, including the p9, p17, and p21 (Figs. S5, S6 and S8). For the p9 peptide, the “Bound” low-energy state was consistently identified in the 2D free energy profile of each individual simulation (Fig. S5). For the p17 peptide, Pep-GaMD simulations were able to refine the peptide conformation from the "Unbound” to the "Intermediate” and “Bound” states in Sim1 and Sim5, while the peptide reached only the "Intermediate” state in the other three simulations (Fig. S6). For the p21 peptide, PepGaMD was able to refine the peptide docking conformation to the "Bound” state in all the five individual simulations (Fig. S8).”

      “It is important to note that the free energy profiles calculated from GaMD simulations of PC1 CTF were not fully converged since certain variations were observed among the individual simulations. Nevertheless, these calculations allowed us to identify representative low-energy binding conformations of the peptides.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) Please provide more background about Rpgrip1l in the introduction, particularly the past studies of mammalian homolog of Rpgrip11, if any? Is there any human disease associated with Rpgrip1l? Do these patients have scoliosis phenotype? 

      • We have added more background on the human ciliopathies caused by RPGRIP1L mutations and on their occasional association with early onset scoliosis (lines 45-54 page 2 in the introduction, see cited references). 

      (2) The allele is a large deficiency of most of the coding region of rpgrip1l, can you give details in the Supplementary data of how you show this by genotyping? It would be good to explain that this mutation is most likely behaving as a null, if you have RNAseq data that supports this please note that. Otherwise, it may be incorrect to assume it is a null allele as your shorthand nomenclature states. If you do not have stronger evidence that the deficiency allele is behaving as a null allele, then please think about using an allele nomenclature as outlined at ZFIN:  

      • We now describe in the results section (Lines 72-76, page 3) the extent of the deletion of rpgrip1l ∆/∆ (22 exons out of 26) that creates an early stop at position 88 of 1256 aas. We have submitted to ZFIN our two novel mutant lines: rpgrip1l∆  is recorded as rpgrip1l bps1 and rpgrip1l ex4 as rpgrip1l bps2 , and we provide this information in the text. Transcriptomics data confirmed this allele is behaving as a null as the most down-regulated transcript found in the brain of rpgrip1l ∆/∆ is rpgrip1l transcript itself, (volcano plot in Fig 5A, described in the results, Line 270-71, page 9).

      • We also have provided in Supplementary Figure 1 A’ a picture of a typical genotyping gel for the rpgrip1l∆ allele. Sequences of both CRISPR guide RNAs and genotyping primers are provided in the Math & Meth section. 

      (3) Throughout the manuscript, the authors refer to zebrafish mutant phenotypes as "juvenile scoliosis". However, scoliosis may not appear until 11 weeks post-fertilization in some animals. After 6-8 weeks of age, it would be more appropriate to describe the phenotype as "late-onset or adult scoliosis" to differentiate between other reported scoliosis mutants (such as hypomorphic or dominant negative alleles of scospondin) that start body curvatures at 3-5 dpf .

      • We think we can really qualify rpgrip1l-/- scoliosis as being a “juvenile scoliosis” as shown by the time course displayed in Fig 1B: rpgrip1l-/- scoliosis develops asynchronously between 4 weeks and 9 weeks (from 0.8 cm/1 cm to 1.6 cm, corresponding to juvenile stages according to Parichy et al, 2009 PMID: 19891001), after which it reaches a plateau. Half of the mutants are already scoliotic by 5 weeks and no scoliosis develops at adult stage, ie from 10 weeks on. We have acknowledged the late onset scoliosis in page 3 line 93.

      (4) A more careful demonstration of the individual vertebrae, using magnified high-resolution pictures in Figures 1D-G, should be made to more clearly show no obvious vertebral malformations are present. 

      • We now provide a movie in Sup Data that presents 3D views of controls and mutant spines, which show the intervertebral spaces as well as vertebral shape and size. With these images we could exclude vertebral fusion and the presence of dysmorphic vertebrae.

      (5) On page 5: the authors comment on transgenic expression of RPGRIP1L in foxj1a-lineages as "rescuing" scoliosis. This terminology is confusing, as rescuing a condition could be interpreted as inducing it where it was once absent. "Suppressing" scoliosis may be a more appropriate term. 

      • We agree with the reviewers, the “rescue” term is confusing, we changed it for “suppress” in the title of the paragraph (line 95 page 3) and within the text (line 115 page 3).

      (6) On page 5, lines 155-156: the authors state that "Indeed, no tissue-specific rescue has been performed yet in zebrafish ciliary gene mutants". This is misleading, as ptk7a and katnb1 mutations both disrupt cilia, and transgenic reintroduction of both ptk7a and katnb1 in foxj1a- expressing lineages has previously been shown to suppress cilia defects as well as scoliosis in these models. The statement should be removed for accuracy. 

      • We agree that we were not precise enough in our sentence: when we mentioned “ciliary gene” mutants, we were referring to genes whose products are enriched within cilia and directly affecting ciliogenesis, cilia content and maintenance such as TZ or BBS genes, without encompassing genes like ptk7 and katnb1 whose products perform multiple functions on top of cilia maintenance such as Wnt signalling and remodelling of the whole microtubule network respectively. We have therefore modified our sentence by adding zebrafish ciliary “TZ and BBS” genes (line 104, page 4).

      (7) Figure 2: panels A-B: In the text (line 196) you state that cilia length was increased and that Arl13b content was severely reduced. However, Panel B shows no significant length difference between scoliotic mutants and controls. This statement and graph should be corrected for accuracy. Also, the Arl13b staining is difficult to see in panel A - can channels be split, and/or quantified? 

      • We have now split the Arl13b and glutamylated tubulin channels (Fig 2 A-C”). We think that the reduction of Arl13b staining intensity is now obvious in both straight and scoliotic mutants (Compare 2A” with 2B” and 2C”). We were not able to quantify Arl13b staining using ciliary masks from glutamylated tubulin staining since both staining only partially overlap along the length of the cilium, Arl13b being more distal than glutamylated tubulin (Fig 2A’). 

      • Ciliary length was significantly increased (from 3.4 to 5.3 µ) in straight rpgrip1l-/-, while the average mean values for scoliotic rpgrip1l-/- were heterogenous (mean 4.1µ) and therefore not significantly different when compared to controls. This heterogeneity stems from the combined presence of both shorter and longer cilia in scoliotic fish, a finding we interpreted by the potential breakage over time of extra-long and thin cilia observed in scoliotic fish (as in Sup figure 1 H’’’, Sup Fig 2M’ and 2O’). 

      • We changed the text to be more accurate: we now state that cilia length increased in straight mutants, and became more heterogenous than controls in scoliotic mutants (line 143-144, page 5). 

      (8) Figure 3: Page 7, line 206: authors state that SCO-spondin secreting cells varied in number along SCO length. What is the evidence that these cells secrete SCO-spondin? The staining shown in Figure 3L-O appears to demonstrate extracellular accumulation of sspo:GFP. What is the evidence that this staining originated from cells in proximity to it? 

      The claim of SCO-secreting cells in Figure 2E-J is confusing. I assume you are using anatomy to infer the SCO is captured in these sections. This should be done in sspo-GFP animals (as in Figure 3) and/or dual anti-body labeling can be done to show SCO-secreting cells and cilia. 

      • We now show in Supplementary Figure 2 A-D a double staining for Sco-spondin-GFP and cilia (Ac-tub, Glu-Tub). Analyzing GFP staining along SCO length on successive sections, we identified the SCO producing cells on the diencephalic dorsal midline by their position under the posterior commissure (PC), which forms an Acetylated Tubulin positive arch), and counted the nuclei surrounded by cytoplasmic GFP from the most anterior region ( 24 cells wide, Sup Fig 2A-A’) to the most posterior region (4-8 cells wide, Sup Fig 2 C).` 

      • Furthermore, the close-ups presented on Fig 2A’ and 2B’ allow to detect the cytoplasmic Sspo-GFP staining around SCO nuclei, above the region presenting primary cilia pointing towards the diencephalic ventricle, both in controls and mutants at scoliosis onset (tail-up mutants), showing that the extracellular staining in B’ very likely originates from these cells. In these tail-up mutants, extracellular Sspo aggregates have not yet filled the whole diencephalic ventricle as in Fig 3 N and Q. 

      (9) Figure 5: Is the transcriptome data and proteomic data consistent for any transcripts and encoded protein products? Please highlight those consistent targets in both analyses. 

      • We would like to emphasize that the transcriptomic study was performed at scoliosis onset, at 5 weeks, while the proteomics analysis was performed at adult stage (3 months) so they cannot be directly compared.

      Moreover, low abundance proteins (such as centrosomal proteins and transcription factors like Foxj1a ) are not detected by label-free proteomics, without prior subcellular fractionation procedure (Lindemann et al, 2017 PMID: 28282288). The extraction protocol also does not allow to purify short neuropeptides such as Urp1-2.

      Nevertheless, we found four targets in common, now highlighted in red in Fig 5, Panel E: Anxa2, complement proteins

      C4 and C7a, and Stat3, all related to immune response, a GO term enriched in both studies as explained in the text (Lines 308-311, page 10). 

      The absence of many inflammation markers or immune response proteins at adult stage in scoliotic mutants most probably indicates a transient inflammatory episode at scoliosis onset, while astrogliosis, as detected by GFAP staining, increases with scoliosis severity. Along the same lines, the two-fold increase of Lcp1 cells within the tectum is present before axis curvature (in straight mutants) and disappears in scoliotic fish (Graph G in Sup Figure S5) as explained in the text, Lines 378-381, page 12, 

      (10) Supplementary Figure 1 F-H: What stage/age samples were used for SEM? It is only stated that they were 'adults'. It is also stated that cilia tufts in straight rpgrip1l-/- fish were morphologically normal but 'less dense'- this was not obvious from the figure. Can density be quantified? (otherwise, data does not support the statement). Similarly, can the statement that "cilia of mono-ciliated ependymal cells showed abnormal irregular structures compared to controls, with either bulged or thinner parts" be supported with measurements/quantification? 

      • The SEM study was performed on 3 months old fish, 3 controls and 5 mutants. We added this information in the figure legend. We could not quantify the number of ciliary tufts in the brain ventricle of the sole straight mutant that was analyzed. We therefore removed the statement that cilia were less dense in the straight mutant. Along the same lines, we mentioned that we could find mutant cilia of irregular shape as shown in Supplementary Figure S1, F”,G’’, H’’ and H’’’) (page 4, lines 124-129). 

      (11) Supplementary Figure 1D-E is never mentioned in the text. The Supplemental Figure legend also refers to a graph of cilia length that is not in the figure itself. As a result, many of the subsequent panel references are out of register. 

      • We now provide the correct version of the legend and refer to Sup Fig 1D-E in the text (page 3, lines 79-81) and its legend, page 53, lines 1616-1620.

      (12) Supplementary Figure 2A-F: Of interest, in panels C and F, it looks as though sspo:GFP is accumulating on cilia within the ventricles of rpgrip1l mutants. Can this be explored? Is it possible that abnormal aggregation of SSPO on cilia is ultimately leading to cilia loss, as you report for multi-ciliated cells surrounding the subcommissural organ? This could be a very interesting finding and possible mechanism for cilia loss.

      • Our observation of all brain sections led us to conclude that the majority of Sspo-GFP aggregates were floating within the brain ventricles of rpgrip1l-/- fish while a portion of aggregates were stuck on ventricle walls, in close contact with cilia as now shown on Supplementary figure S2 B’, outlined in legend page 54, lines 1634-1637. We agree that the contact between Sspo aggregates and cilia might have damaging consequences, either on cilia maintenance or on immune reaction induction and we now mention these possibilities in the discussion page16, lines 524-526. These research lines will be explored in the near future.

      (13) Supplementary Figure 5A-F is not mentioned in the manuscript. Please clarify the role of Anxa2 in neuroinflammation. Is increased Anxa2 expression in rpgrip1l mutant zebrafish reduced after anti-inflammatory drug treatment? What is the expression level of anxa2 in cep290 mutant zebrafish? 

      • We have now added mention to Supplementary Figure 5A-F in the text page 10 lines 328-331. 

      • We unfortunately did not have enough histological material to test Anxa2 staining on NACET treated fish after performing GFAP and Lcp1 staining, neither for dilatation measurement or multiciliated cells quantification. We agree this would have helped to better define which defect might be an indirect consequence of an inflammatory environment.

      • We tested the expression level of Anxa2 in cep290-/- fish. No labelling above control level was detected on cep290-/- brain sections that were positive for GFAP (N = 5). As GFAP staining in 3-4 weeks cep290-/- was not as intense and widespread as in adult rpgrip1l-/- (50% of GFAP + cells compared to 100% in the SCO for example), we concluded that Anxa2 expression may be upregulated after widespread or long-term astrogliosis/inflammation. Alternatively, Anxa2 overexpression could be specific to rpgrip1l-/- fish. 

      (14) A summary diagram at the end would be helpful for understanding the main findings. 

      We added a Graphical Abstract summarizing the main conclusions and hypotheses of this study. It is mentioned and explained in the Discussion section, p. 16 lines 504-508 and 516-529. 

      (15) The sspo-GFP zebrafish line should be listed in the STAR methods section: 

      The sspo-GFP line is now listed in the STAR methods, Scospondin-GFPut24, (Troutwine et al., 2020 PMID: 32386529), p.43, last line.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1) The CIRSPR screen designed to detect regulators of damage-induced transcriptional repression is based on EU incorporation following a 7-day selection of stable knockout cells. As the authors point out, cell cycle arrest reduces rDNA transcription on its own. The screen, which assesses changes in sgRNA distribution in EU high cells, is thus likely to be dominated by factors that affect cell cycle progression. This is exemplified in the analyses of top hits related to neddylation. The screen's limitations in terms of identifying DDR effectors of damage-induced silencing need to be clearly stated. 

      Notably, our screen did identify known DNA damage response effectors of damage-induced silencing, for example ATM was a top hit, as discussed in the paper and shown in Fig. 5B. We consider that our unbiased approach had advantages because in addition to finding known DDR effectors, we uncovered novel requirements, such as the need for cells to be cycling, for transcriptional silencing in response to DNA damage. We didn’t find the canonical key cell cycle regulators in our screen. One possibility might be that cell cycle arrest or cell death upon their knock down may lead to out-competition during the seven-day treatment with doxycycline resulting in depletion from, rather than enrichment in, the targeting gRNAs from cells that maintain transcription 7 days after DNA damage.

      Comment 2) The authors confirm previous findings of DNA damage-induced repression of rDNA and histone gene transcription. The authors propose that these highly transcribed genes are more susceptible to silencing than the bulk of protein-coding genes and propose a global damage-induced signaling event that is independent of DNA breaks in cis. While this is possible, it is not demonstrated in this manuscript, and the authors should acknowledge alternative explanations. For example, the loci found to be repressed by bulk IR are highly repetitive gene arrays that tend to form nuclear sub-compartments (nucleoli, histone bodies). As such, their likelihood of being in the vicinity of DNA damage is high, at least for a fraction of gene copies. The findings, therefore, remain consistent with cis-induced silencing. Moreover, silencing may spread through the relevant nuclear sub-compartments, consistent with the formation of DNA damage compartments described recently (PMID: 37853125). 

      The reason for us “suggest(ing) that the reduced bulk abundance of nascent transcripts after IR may occur in trans as a programmed event” was based on the gene length-independent and IR dose-independent nature of the gene silencing shown in Fig. 2D and Fig. 4C), not that rDNA and histone gene expression went down the most after IR. Indeed, we stated that “Those genes that were normally most highly transcribed were repressed after IR, while genes that were normally expressed at intermediate or low levels tended to be induced after IR (Fig. 4A). The mechanistic reason for this is unclear.” We thank the reviewer for the suggestion that this may be due to these genes existing in nuclear sub-compartments. We have now incorporated this possibility into the discussion.

      Other comments: 

      (1) The statement that silencing is due to transcription initiation rather than elongation is not sufficiently supported by the data. Could equivalent nascent transcript reduction not be the result of the suppression of elongating RNA PolII? To draw the proposed conclusion, the authors would need to demonstrate that RNA PolII initiation is altered, using RNA PollII ChIP and/or analysis of relevant RNA PolII phosphorylation patterns. 

      Figure 4F shows the distribution of nascent transcript reads throughout the open reading frame of the repressed genes. It shows that the transcript abundance throughout the ORF, including at the 5’ end, is reduced. This pattern is consistent with a defect in initiation. We have now clarified the description of these results to state that: “Our data is consistent with the possibility that the major mechanism for the repression of the ~1,000 protein coding genes after IR is at the transcriptional initiation stage. However, our data do not rule out that elongation may be additionally repressed after IR, as this would not be observed in our analyses due to concomitant repression of transcriptional initiation.” 

      (2) The lack of rDNA silencing in arrested cells is interesting, though the underlying mechanism remains unclear. To further corroborate the proposed defect in ATM-mediated signaling, the authors should look directly at ATM and Treacle phosphorylation upstream of TOPBP1. 

      We would love to have shown that ATM dependent phosphorylation does not occur upon IR. We had attempted this multiple times but unfortunately the available phospho Treacle antibodies were not suitable for rigorous analyses in our hands.

      (3) The "change in relative heights of the EU low (G1) and EU high (S/G2) peaks" in Figures 5D, 5E, and 6B is central to the proposed model of transcriptional changes being affected by cell cycle arrest. These differences should be visualized more clearly and quantified across independent experiments. Ideally, the cell cycle stage should be dissected as in Figure 2B. How do the authors envision cell cycle arrest triggers the defect in transcriptional silencing? 

      In the previous version, the last paragraph described one possibility for how rDNA may fail to be repressed in arrested cells after IR, based on the results shown in Fig. 7F and G.  We have now added a paragraph in the discussion section beginning “Why would cell cycle arrest in G1 or G2 phases of the cell cycle prevent transcriptional repression of rDNA and histone genes after IR?”

      Reviewer #2:

      (1) Define ERCC normalization. 

      We apologize for this omission. We now have explained ERCC normalization and have added a citation to a commentary that we wrote on spike-in controls 2015 for further explanation.

      (2) On page 8, the authors speculate that genes involved in immune response after IR was activated due to cytoplasmic DNA in pre-B cells. Where are these cytoplasmic DNAs from? Is there any literature indicating that 30 30-minute IR treatment can induce cytoplasmic DNA? 

      We have removed this speculation, as there is no evidence currently to support it.

      (3) Related to the points above, are ERVs or repetitive DNA elements up-regulated upon IR treatment, which in turn results in increased expression of genes involved in immune response? 

      The induction of cytokines as a rapid response to irradiation is a major part of the immediate early gene program induced in response to ROS (and now is explained in the manuscript).

      (4) Please explain in the result section how overlap levels of transcription determined by EU are reduced after IR, and yet the number of genes with increased expression upon IR treatment is much more than that of genes with reduced expression. 

      We have explained that while less genes have reduced expression after IR than the number of genes that increase expression after IR, those genes that have reduced expression are extremely highly expressed to start off with. As a result, the bulk amount of transcripts is reduced after IR.

      (5) Do cells treated with MLN4924 block the down-regulation of histone genes and ribosomal genes? 

      We have not addressed this directly. However, given that the reduction of gene expression that occurs after IR is largely due to repression of histone and rDNA genes, it is safe to speculate that these are the genes that are no longer repressed during cell cycle arrest.

      (6) Is IR-induced down-regulation of histone genes due to cell cycle changes? 

      We do not know for sure if this is the case. It is relevant to note that even without IR, histone expression per se is regulated by cell cycle changes, being lower outside of S phase – and the majority of  non-arrested cells in our study are in S phase (Fig. 2B). As such, arrest of cells per se outside of S phase would be sufficient to reduce histone expression level.

      We would like to thank the reviewers again for their insightful suggestions and comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript dissects the contribution of the CaBP 1 and 2 on the calcium current in the cochlear inner hair cells. The authors measured the calcium current inactivation from the double knock-out CaBP1 and 2 and showed that both proteins contribute to voltage-dependent and calcium-dependent inactivation. Synaptic release was reduced in the double KO. As a consequence, the authors observed a depressed activity within the auditory nerve. Taken together, this study identifies a new player that regulates the stimulation-secretion coupling in the auditory sensory cells. 

      Strengths: 

      In this study, the authors bring compelling evidence that CaBP 1 and 2 are both involved in the inactivation of the calcium current, from cellular up to system level, and by taking care to probe different experimental conditions such as different holding potentials and by rescuing the phenotype with the re-expression of CaBP2. Indeed, while changing the holding potential worsens the secretion, it completely changes the kinetics of the inactivation recovery. It alerts the reader that probing different experimental conditions that may be closer to physiology is better suited to uncovering any deleterious phenotype. This gave pretty solid results. 

      Weaknesses: 

      Although this study clearly points out that CaBP1 is involved in the calcium current inactivation, it is not clear how CaBP1 and CaBP2 act together (but this is probably beyond the scope of the study). Another point is that the authors re-express CaBP2 to largely rescue the phenotype in the double KO but no data are available to know whether the re-expression of both CaBP1 and CaBP2 would achieve a full recovery and what would be the effect of the sole re-expression of CaBP1 in the double KO.

      We would like to thank the reviewer for the appreciation of our work. We agree that the effect of the sole re-expression of CaBP1 in the double KO remains elusive and have planned to address this question in a follow-up study. 

      Reviewer #2 (Public Review): 

      Summary: 

      In the manuscript by Oestreicher et al, the authors use patch-clamp electrophysiology, immunofluorescent imaging of the cochlea, auditory function tests, and single-unit recordings of auditory afferent neurons to probe the unique properties of calcium signaling in cochlear hair cells that allow rapid and sustained neurotransmitter release. The calcium-binding proteins (CaBPs) are thought to modify the inactivation of the Cav1.3 calcium channels in IHCs that initiate vesicle fusion, reducing the calcium-dependent inactivation (CDI) of the channels to allow sustained calcium influx to support neurotransmitter release. The authors use knockout mice of Cabp1 and Cabp2 in a double knockout (Cabp1/2 DKO) to show that these molecules are required for enabling sustained calcium currents by reducing CDI and enabling proper IHC neurotransmitter release. They further support their evidence by re-introducing Cabp2 using an injection of AAV containing the Cabp2 sequence into the cochlea, which restores some of the auditory function and reduces CDI in patch-clamp recordings. 

      Strengths: 

      Overall the data is convincing that Cabp1/2 is required for reducing CDI in cochlear hair cells, allowing their sustained neurotransmitter release and sound encoding. Figures are well-prepared, recordings are careful and stats are appropriate, and the manuscript is well-written. The discussion appropriately considers aspects of the data that are not yet explained and await further experimentation.

      Weaknesses: 

      There are some sections of the manuscript that pool data from different experiments with slightly different conditions (wt data from a previous paper, different calcium concentrations, different holding voltages, tones vs clicks, etc). This makes the work harder to follow and more complicated to explain. However, the major conclusion, that cabp1 and 2 work together to reduce calcium-dependent inactivation of L-type calcium channels in cochlear inner hair cells, still holds. 

      Another weakness is that the authors used injections of AAV-containing sequences for Cabp2, but do not present data from sham surgeries. In most cases, the improvement of hearing function with AAV injection is believable and should be attributed to the cabp2 function. However, in at least one instance (Figure 4B), the results of the AAV injection experiments may be overinterpreted - the authors show that upon AAV injection, the hair cells have a much longer calcium current recovery following a large, long depolarization to inactivate the calcium channels. Without comparison to sham surgery, it is not known if this result could be a subtle result of the surgery or indeed due to the Cabp2 expression.  It would be great to see the auditory nerve recordings in AAV-injected animals that have a recovery of ABRs. However, this is a challenging experiment that requires considerable time and resources, so is not required.

      We would like to thank the reviewer for the appreciation of our work. We agree with the reviewer that sham surgery may convey more information that might benefit the interpretation of our data. The recovery experiments were very tedious and these long patch-clamp paradigms required extremely stable recordings. Based on our observations, we plan to address the recovery kinetics into more detail in the follow-up study. However, we would consider off-side effects of the surgery (as it may mainly affect middle ear function) and of the empty AAV-vector on inner hair cell calcium current recovery rather unlikely, but we cannot exclude them. We thus added a sentence in the discussion to alert to that. Based on previously published data of the effect of PHP.eB-Cabp2eGFP in WT animals we expect some (mild) adverse effects on hearing from overexpression of CaBP2 and/or eGFP in the inner ear. In the future, we thus plan to further optimize the treatment. In terms of the in vivo recordings from the auditory nerve fibers of the rescued mice, we could not agree more. That is in plan for the follow-up study.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors attempted to unravel the role of the Ca2+-binding proteins CaBP1 and CaBP2 for the hitherto enigmatic lack of Ca2+-dependent inactivation of Ca2+ currents in sensory inner hair cells (IHCs). As Ca2+ currents through Cav1.3 channels are crucial for exocytosis, the lack of inactivation of those Ca2+ currents is essential for the indefatigable sound encoding by IHCs. Using a deaf mouse model lacking both CaBP1 and CaBP2, the authors convincingly demonstrate that both CaBP1 and CaBP2 together confer a lack of inactivation, with CaBP2 being far more effective. This is surprising given the mild phenotype of the single knockouts, which has been published by the authors before. Readmission of CaBP2 through viral gene transfer into the inner ear of double-knockout mice largely restored hearing function, normal Ca2+ current properties, and exocytosis. 

      Strengths: 

      (1) In vitro electrophysiology: perforated patch-clamp recordings of Ca2+/Ba2+ currents of inner hair cells (IHCs) from 3-4 week-old mice - very difficult recordings - necessary to not interfere with intracellular Ca2+ buffers, including CaBP1 and CaBP2. 

      (2) Capacitance (exocytosis) recordings from IHCs in perforated patch mode. 

      (3) The insight that a negative holding potential might underestimate the impact of lack of CaBP1/2 on the inactivation of ICa in IHCs. As the physiological holding potential is much more positive than a preferred holding potential in patch clamp experiments it has a strong impact on inactivation in the pauses between depolarization mimicking receptor potentials. This truly advances our thinking about the stimulation of IHCs and accumulating inactivation of the Cav1.3 channels. 

      (4) Insight that the voltage sine method with usual voltage excursions (35 mV) to determine the membrane capacitance (for exocytosis measurements) also favors the inactivated state of Cav1.3 channels 

      (5) Use of double ko mice (for both CaBP1 and CaBP2, DKO) and use of DKO with virally injected CaBP2eGFP into the inner ear. 

      (6) Use of DKO animals/IHCs/SGNs after virus-mediated CaBP2 gene transfer shows a great amount of rescue of the normal ICa inactivation phenotype.

      (7) In vivo measurements of SGN AP responses to sound, which is highly demanding. 

      (8) In vivo measurements of hearing thresholds, DPOAE characteristics, and ABR wave I amplitudes/latencies of DKO mice and DKO+injected mice compared to WT mice. 

      Very thorough analysis and presentation of the data, excellent statistical analysis.

      The authors achieved their aims. Their results fully support their conclusions. The methods used by the authors are state-of-the-art. 

      The impacts on the field are the following:

      Regulation of inactivation of Cav1.3 currents is crucial for the persistent functioning of Cav1.3 channels in sensory transduction. 

      The findings of the authors better explain the phenotype of the human autosomal recessive DFNB93, which is based on the malfunction of CaBP2. 

      Future work - by the authors or others - should address the molecular mechanisms of the interaction of CaBP1 and 2 in regulating Cav1.3 inactivation. 

      Weaknesses: 

      I do not see weaknesses. 

      What is not explained (but was not the aim of the authors) is how the CaBPs 1 and 2 interact with the Cav1.3 channels and with each other to reduce CDI. Also, why DFNB93, which is based on mutation of the CaBP2 gene, lead to a severe phenotype in humans in contrast to the phenotype of the CaBP2 ko mouse.

      We would like to thank the reviewer for the appreciation of our work and the amount of effort that went into these experiments. These are the questions that we are posing ourselves as well and would like to address them in the future.   

      Recommendations for the authors:

      Reviewing editor: 

      In the Introduction, the authors may also mention that Ca2+-dependent and voltage-dependent inactivation of L-type Ca channels has been reported at ribbon synapses of retinal bipolar cells (see von Gersdorff & Mathtews, J Neurosci. 1996, 16(1):115-122). These are critical retinal interneurons involved in the continuous exocytosis of synaptic vesicles onto retinal ganglion cells. 

      We would like to thank the reviewing editor for pointing that out, we have added the reference in the revised version of the manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      Conditions worsen with age but no numbers regarding the threshold shift are provided. 

      For better readability, we now included click threshold values for both genotypes and age groups in the MS text, results section.   

      Do the authors correlate the re-expression level of CaBP2 using GFP to the rescuing phenotype (for exocytosis or BK channels immunostaining)?

      The restoration of BK expression in the virus-treated IHC was a side observation of our study, which was not performed in sufficient replicates for proper quantification. In the future, we will address this question into greater detail, possibly with improved viral constructs. In a previous study, we attempted to correlate eGFP fluorescence intensity with residual depolarization-evoked calcium current in CaBP2-injected IHC of Cabp2 single KO animals. At that time, we were unable to establish a convincing correlation. This could be related to (i) large variability in the data, possibly requiring much larger datasets to observe potential correlation above the noise, (ii) variable imaging conditions from prep to prep, or (iii) additional parameters that could influence the outcome of the current rescue, e.g. uncontrolled expression of the transgene. However, we did analyse the correlation between ABR click thresholds and mean IHC eGFP fluorescence in another, preliminary set of data that included different viruses at different titres. There, we were able to observe a relatively good correlation. Interestingly, some of the highest expression levels resulted in poorer threshold recovery, which could indicate harmful overexpression. Moreover, the correlation was only detected when the difference of the mean eGFP expression levels per organ was large. Furthermore, significantly less efficient ABR threshold recovery was observed in the non-injected contralateral ears, which showed a significantly lower viral expression of the transgene. In our follow-up study, we will investigate the question of dose dependence of rescue in more detail.  

      Reviewer #2 (Recommendations For The Authors): 

      -  There are two paragraphs in the results text about supplemental figure #2, which suggests that it should be moved to the main figures. 

      We would like to thank the reviewer for this suggestion. Figure S2 has now been moved to the main figures (as current Figure 5) and has been modified to accommodate the BK cluster analysis panel. The histogram with the number of ribbon synapses was removed as the data was redundant with the numbers given in the MS text.  

      -  Overall it is hard to distinguish between dark blue and black in many figures, including the dual-color asterisks.

      To improve the readability and clarity of the figures, we exchanged dark blue with magenta.  Dual-color asterisks in Fig. 3 were changed to single-color asterisks and what they refer to is explain in the figure legend.  

      -  Figure 4 legend - there is a mis-spelling of cabp in the fourth line from the bottom. 

      -  Figure 4 legend - the last line does not make sense - describes recovery as being both 'much faster' and 'slowest'.

      -  Figure 6 title - consider removing 'nearly blocked' and replacing it with 'impaired'.

      We would like to thank the reviewer for noticing these mistakes that have been corrected in the revised version, as suggested.

      -  The calculations of VDI and CDI could be better explained, specifically detailing that VDI is calculated first from currents using barium as a divalent, followed by the calculation of CDI. 

      We included an explanatory sentence in the results section as suggested and are additionally referring the readers to the methods section for the mathematical formulas.

      -  Why were two different tests (one parametric and one non-parametric) used for the Figure 3B data? 

      We performed a point-by-point-comparison of data. The choice of test was made based on the distribution and the variance of the data points. We now opted for a unified test, t test with Welch correction, which assumes that samples come from populations with normal distribution, but does not make assumption about equal variances. The outcome of these tests were similar. 

      -  The much broader tuning of the auditory nerve fibers is interesting, consider including this in a figure. 

      For recording tuning curves, we use an automated algorithm which adapts the tone burst intensity and frequency depending on the preceding results. The threshold criterion is an increase of spiking by 20Hz above spontaneous rate. This routine works fairly well in wild-type animals. However, DKO SGNs typically had very high thresholds at >80 dB across all frequencies, which can partly be explained by the fact that they had very low spike rates and did not reach that criterion. Besides tuning curve runs, we also tried systematic frequency sweeps and manual frequency control to determine a best frequency, followed by a rate intensity function at that frequency to determine “best threshold”. 

      All this was difficult, because in the DKO SGNs, sound threshold detection was challenged by the strong dependence of spiking on the duration of the preceding silent interval. A preceding stimulus outside the frequency response area or below the activation threshold of the SGN would thus improve spiking by allowing for longer recovery, while a preceding efficient stimulus would reduce it. Thus, the sound threshold determined in a rate level sweep varied depending on the interstimulus interval and possibly even on the (randomized) order at which the intensities were played. 

      A meaningful threshold measure would require long silent interstimulus intervals, i.e. a long recording time. As tuning curves require multiple threshold measures, it seemed impossible to obtain a useful dataset at high quality. As we deemed the spike rate dependence on interstimulus intervals more important than the tuning we rather focused on tone burst responses acquired at frequency/intensity combinations at which the hair cells and their synapses were maximally activated. In wild-types, these would be tone bursts at characteristic frequency or noise bursts in the saturated part of the rate intensity function, which typically has a dynamic range of 10-25dB. As we assume (based on DPOAE) that cochlear micromechanics and amplification are mostly normal in the DKOs, we hypothesize that the sensitivity and dynamic range of basilar membrane motion and  inner hair cell transduction are normal and that the increase in single unit thresholds and loss of sharp tuning are another readout of synaptic dysfunction. 

      - Figure S2 - please show separate panels for each channel, it is very difficult to make out the changes by eye in the merged panels. 

      Done.  

      - Figure S2 G - the results text stated that the BK channel clusters 'appeared' smaller - why was this not measured? 

      We have performed additional experiments to enable proper analysis of the BK channel clusters. The analysed data shows that the BK clusters are considerably larger and more abundant in the WT as compared to CaBP1/2-deficient IHCs of approx. 4-week-old mice. The results of the analysis are included in the immunohistochemistry figure (now Fig. 5) and are further commented in the results section.  

      Reviewer #3 (Recommendations For The Authors): 

      I have only a few minor points on the MS: 

      (1) Some labels in Figure 1 are too small and hard to read, e.g. y-axis in B-F. Wherever you use subscripts on the axes, the labeling needs to be larger.

      (2) Fig. 1A: the colors for CaM and CaBP1.2 are too similar, at least on my printout. Please use more distant colors.

      (3) Reference 24 should be corrected (no longer in press).

      These points have been addressed in the revised version of the MS.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study examines the role of a host in conditions that shift pathogenicity of opportunistic microbes. The use of single-cell microbial transcriptomics and metabolomics to demonstrate the host's effects on pathogen dynamics is interesting and convincing. However, the connection to host antimicrobial peptides driving these effects is incomplete and would benefit from additional evidence and improved explanation in the text. This paper has the potential to be of broad interest to those working in host-microbe (microbiome and pathogen) interactions.

      We appreciate the editors for organizing our manuscript and providing eLife assessment. We went through each comment and carried out some necessary experiments. According to the comments, we here provide additional evidence that further supports our findings in this revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Wang and colleagues used Drosophila-Serratia as a host-microbe model to investigate the impact of the host on gut bacteria. The authors showed that Drosophila larvae reduce S. marcescens abundance in the food likely due to a combination of mechanical force and secretion of antimicrobial peptides. S. marcescens exposed to Drosophila larvae lost virulence to flies and could promote larval growth similar to typical Drosophila gut commensals. These phenotypic changes were reflected in the transcriptome and metabolome of bacteria, suggesting that the host could drive the switch from pathogenicity to commensalism in bacteria. Further, the authors used single-cell bacterial RNA-seq to demonstrate the heterogeneity in gut bacterial populations.

      Strengths:

      This is a valuable work that addresses an important question of the effect of the host on its gut microbes. The authors could convincingly demonstrate that gut bacteria are strongly affected by the host with important consequences for both interacting partners. Moreover, the authors used state-of-the-art bacterial single-cell RNA-seq to reveal heterogeneity in host-associated commensal populations.

      Weaknesses:

      Some of the conclusions are not fully supported by the data.

      Specifically, in lines 142-143, the authors claim that larva antagonizes the pathogenicity of S. marcescens based on the survival data. I do not fully agree with this statement. An alternative possibility could be that, since there are fewer S. marcescens in larvae-processed food, flies receive a lower pathogen load and consequently survive. Can the authors rule this out?

      Also, the authors propose that Drosophila larvae induce a transition from pathogenicity to commensalism in S. marcescens and provide nice phenotypic and transcriptomic data supporting this claim. However, is it driven only by transcriptional changes? Considering high mutation rates in bacteria, it is possible that S. marcescens during growth in the presence of larvae acquired mutations causing all the observed phenotypic and transcriptional changes. To test this possibility, the authors could check how long S. marcescens maintains the traits it acquires during growth with Drosophila. If these traits persist after reculturing isolated bacteria, it is very likely they are caused by genome alterations, if not - likely it is a phenotypic switch driven by transcriptional changes.

      We thank the reviewer for providing a feasible method to distinguish the shift in transcriptional profile from genomic mutations. According to this valuable suggestion, we checked phenotypic and transcriptional changes after re-culturing the bacterium that had coexisted with larvae. We found that all phenotypes can be recovered after re-culturing. The new data supported our previous result that a phenotypic switch was driven by transcriptional changes rather than genome mutations. We now add these results to the text with figure supplement 3 (line 147-151, 192-194). Please see the following text.

      “To rule out the possibility that phenotypic alterations could stem from genomic mutations, we examined the prodigiosin yield and CFUs of re-culturing S. marcescens that had coexisted with larvae. Our results showed that neither prodigiosin yield nor CFUs of re-culturing S. marcescens differed from the original strain (Figure 2-figure supplement 3A-C), suggesting that a phenotypic switch was driven primarily by transcriptional reprogramming.” “Consistent with the previous result that this phenotypic switch was driven by transcriptional changes, the expression of virulent and growth genes was recovered after re-culturing (Figure 3-figure supplement 3D, E).”

      For the first question, we admit the possibility that the high morality of flies could result from the acquirement of a higher pathogen load, because of an increase in the bacterial load of single S. marcescens. However, host pathogenesis is normally determined by the virulence of pathogens rather than the number of bacteria. For example, hosts constantly harbor astonishing commensals in their guts, but remain healthy. This evidence suggests that it was the property (virulence) of a pathogen that is more important to affect the health status of the hosts. Moreover, an increase in virulence of single S. marcescens was verified by real-time PCR (Fig. 2F) and TE (Fig. 2G). Taken together, we could draw a conclusion that the impaired survival of flies challenged with single S. marcescens mainly arose from an increase in the virulence of S. marcescens. Thanks for your understanding!

      Reviewer #2 (Public Review):

      Summary:

      While many studies have explored the impacts of pathogens on hosts, the effect of hosts on pathogens has received less attention. In this manuscript, Wang et al. utilize Drosophila melanogaster and an opportunistic pathogen, Serratia marcescens, to explore how the host impacts pathogenicity. Beginning with an observation that larval presence and density impacted microbial growth in fly vials (which they assess qualitatively as the amount of 'slick' and quantitatively as microbial load/CFUs), the authors focus on the impact of axenic/germ-free larvae on an opportunistic pathogen S. marcescens. Similar to their observations with general microbial load, they find that larvae reduce the presence of a pinkish slick of Sm, indicative of its secondary metabolite prodigiosin. The presence of larvae alters prodigiosin production, pathogen load, pathogen cellular morphology, and virulence, and this effect is through transcriptional and metabolic changes in the pathogen. Overall, they observe a loss of virulence factors/pathways and an increase in pathways contributing to growth. Given the important role the host plays in this lifestyle shift, the authors then examined host features that might influence these effects, focusing on the role of antimicrobial peptides (Amps). The authors combine the use of synthetic Amps and an Amp-deficient fly line and conclude much of the larval inhibitory effect is due to their production of AMPs.

      Strengths:

      This is a very interesting question and the use of Drosophila-Serratia marcescens is a great model to explore these interactions and effects.

      The authors have an interesting and compelling phenotype and are asking a unique question on the impact of the host on the pathogen. The use of microbial transcriptomics and metabolomics is a strength, especially in order to assess these impacts on the pathogen level and at the single-cell level to capture heterogeneity.

      Weaknesses:

      Overall, the writing style in the manuscript makes it difficult to fully understand and appreciate the data and its interpretation.

      The data on the role of AMPs would benefit from strengthening. Some of the arguments in the text of that section are also counterintuitive. The authors show that △AMP larvae have a reduced impact on Sm as compared to wt larvae, but it seems less mild of an effect than that observed with wt excreta (assuming the same as secreta in Figures 7, should be corrected or harmonized). Higher doses of AMPs give a phenotype similar to wt larvae, but a lower dose (40 ng/ul) gives phenotypes more similar to controls. The authors argue that this data suggests AMPs are the factor responsible for much of the inhibition, but their data seems more to support that it's synergistic- you seem to still need larvae (or some not yet defined feature larvae make, although secreta/excreta was not sufficient) + AMPs to see similar effects as wt. Based on positioning and color scheme guessing that AMP 40ng/ul was used in Figures 7D-H, but could not find this detail in the text, methods, or figure legend and it should be indicated. This section does not seem to be well supported by the provided data, and this inconsistency greatly dampened this reviewer's enthusiasm for the paper.

      We thank the reviewer’s valuable comments and suggestions. We admitted that some photos of the pinkish slick (prodigiosin) are counterintuitive in Figure 7 as well as figure supplement 2B. Here comes the reason. Single S. marcescens produced prodigiosin that only stayed on the surface of fly agar medium. As we know, larvae can agitate food and form a stratification of prodigiosin, even making higher prodigiosin yield inside food lighter than the surface slick of prodigiosin. We mentioned it in the previous manuscript line 166-168. This is why some photos treated with excreta and a lower dose of AMP seemed more intense than those with WT larvae. However, we precisely quantified the prodigiosin yield inside food with the spectrophotometer, so we provided a prodigiosin yield following the photos of the slick. Therefore, we drew our conclusions mainly relying on the quantification of the prodigiosin yield. We actually used cecropin A for our experiments, so we added this information in the text. We hope that our replies can reignite your enthusiasm for our manuscript, and thanks for your great support!

      Reviewer #3 (Public Review):

      In this study, Wang and coworkers established a model of Drosophila-S. marcescens interactions and thoroughly examined host-microbe bidirectional interactions. They found that:

      (1) Drosophila larvae directly impact microbial aggregation and density;

      (2) Drosophila larvae affect microbial metabolism and cell wall morphology, as evidenced by reduced prodigiosin production and EPS production, respectively;

      (3) Drosophila larvae attenuate microbial virulence;

      (4) Drosophila larvae modulate the global transcription of microbes for adaptation to the host;

      (5) Microbial single-cell RNA sequencing (scRNA-seq) analysis revealed heterogeneity in microbial pathogenicity and growth;

      (6) AMPs are key factors controlling microbial virulence phenotypes.

      Taken together, they concluded that host immune factors such as AMPs are directly involved in the pathogen-to-commensal transition by altering microbial transcription.

      General comments:

      In general, this study is intriguing as it demonstrates that host immune effectors such as AMPs can serve as critical factors capable of modulating microbial transcription for host-microbe symbiosis. However, several important questions remain unanswered. One such question is: What is the mechanism by which AMPs modulate the pathogen-to-commensal transition? One hypothesis suggests that antimicrobial activity may influence microbial physiology, subsequently modulating transcription for the transition from pathogen to commensal. In this context, it is imperative to test various antibiotics with different modes of action (e.g., targeting the cell wall, transcription, or translation) at sub-lethal concentrations to determine whether sub-lethal doses of antimicrobial activity are sufficient to induce the pathogen-to-commensal transition.

      Thank you for the important comments on our manuscript. We checked the effect of antibiotics (5 μg/μl kanamycin and 10 μg/μl ampicillin) on the virulence switch of S. marcescens. We found that the two antibiotics with the sub-lethal doses similarly resulted in a decrease in prodigiosin yield and virulence expression of S. marcescens. Intriguingly, the two antibiotics also resulted in a dramatic decline in the bacterial load and the expression of genes involved in cell growth. These results suggest that antibiotics reduced the virulence primarily through suppressing most activities of bacteria.

      We found that larvae and AMPs at 40 μg/μl modestly resulted in a decrease in bacterial load and an increase in the relative level of genes involved in cellular proliferation, suggesting that AMPs could maintain the exponential phase of bacterial growth. This result is consistent that Drosophila larvae can support the long-term persistence of commensals in the shared habitat (DOI: 10.1016/j.cmet.2017.11.011). The inhibition could prevent bacteria from rapidly exhausting their nutritional resources, and consequently maintain symbiosis. It is likely that AMPs could maintain S. marcescens at the exponential phase of cell growth and prevent bacteria from rapidly exhausting their nutritional resources.

      Author response image 1.

      (A) Representative images of surface slick with S. marcescens alone, with kanamycin (5 μg/μl) and ampicillin (10 μg/μl). (B) The prodigiosin production of S. marcescens alone, with kanamycin (5 μg/μl) and ampicillin (10 μg/μl). n = 6 for each. (C) Bacterial loads of S. marcescens alone, with kanamycin (5 μg/μl) and ampicillin (10 μg/μl). n = 6 for each. (D, E) RT-qPCR analysis of the expression levels of downregulated and upregulated genes in the S. marcescens alone, with kanamycin (5 μg/μl) and ampicillin (10 μg/μl). n = 3 for each. Means ± SEMs. All variables have different letters, they are significantly different (p < 0.05). If two variables share a letter, they are not significantly different (p > 0.05). ns, no significance. Kruskal-Wallis test followed by Dunn’s multiple comparisons test.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some specific points that need to be addressed:

      (1) Lack of statistical analysis for many figures. The authors should perform and report the statistical analysis for all figures where it is currently lacking, specifically, Figures 2C, D, E, F, H; Figures 3E, F; Figures 7G, H; Figure S2E, Figures S3D, E.

      Thanks for your valuable suggestions. We re-checked the manuscript and performed the statistical analysis for these figures.

      (2) For graphs showing dots, it should be specified what exactly individual dots show and how many animals were used per replicate. Also, time points at which specific analysis was performed should be specified.

      We provided the important information in the legends in the revised manuscript.

      (3) Figure 2. No letters illustrating statistical significance are shown, although this is claimed in the legend (line 848).

      We added statistical significance in the updated Figure 2.

      (4) In Figure 7, the authors used AMPs of defined concentration, but it is not specified what exactly these AMPs are. Please provide the full composition of the AMP mix used.

      We used the antimicrobial peptide cecropin A produced by a silkworm. We added this information in the methods line 487-488 and Figure 7 legend.

      (5) Figure S2B. To me, it looks like that medium with larvae is redder than after mechanical force. I find it hard to believe the quantification in panel C that the medium with larvae has 3 times less pigment as compared to the mechanical force.

      Larvae could only agitate the surface of food (~0.4 cm), but sticks completely agitated the food up to 3 cm. Thus, the layer of food with pink pigment with agitation seemed much deeper than with larvae, which was responsible for the counterintuitively. We explained it in the previous manuscript (line 166-168). “Of note, the surface of the slick with agitation appeared lighter than that of larvae, mainly due to a stratification of prodigiosin following agitation.”

      (6) The authors need to proofread the manuscript as there are missing words, terms that need definition, and wrong terms. For example, L86 - naked eye?, L117 - what do the authors mean by co-culture?, L309 - not resist but rather combat, L347 - Species? or competition?, Figure 2A - 2nd?

      We have corrected these errors in the new manuscript. We added an "eye" in L86. Co-culture means “S. marcescens in co-culture”. Interspecies competition for nearly the same or similar nutrients and space occurs in the habitat.

      (7) The authors should reorganize either the text or the figures' order in a way that the figures are described in a consecutive order (Figure 1A, B ... and not Figure 1D first and then 1A).

      Thanks for your valuable advice. We reorganize the order of the text.

      (8) Do the authors have an idea which bacteria they quantified in Figures 1E to 1G? I didn't find the medium that was used for culturing. Also, in Figure 1F, Is the control group comprised of females or males?

      Mixed bacteria (bacteria in the living environment of Drosophila) were quantified in the NA medium that supports the growth of Drosophila microbiota (Jia Y, et al. Nat Commun. 2021) line 474-475. The control group comprised of both males and females with a 1:1 ratio. Similarly, the aged group contained 100 50-day-aged flies, male: female = 1:1. We provided details in Figure 1 legend line 849-850, 851-852.

      (9) L118-129. it is not possible to make all these statements without any statistical analysis. To me, at 96h both treatments have the same CFUs, while the authors claim they are different.

      We added statistical analysis in the current version. In fact, single S. marcescens became collapsed after 72 h post inoculation, and the CFU number of single S. marcescens declined step by step. The bacterial load of S. marcescens in co-culture was comparable (at 96 h post-inoculation, p>0.05) or higher (at 120 h post-inoculation, p<0.001) than S. marcescens alone, possibly explained by the possibility that bacteria rapidly exhausted the nutritional resources and collapsed through population suicide. We rewrote this sentence line 125-129 in the updated manuscript.

      (10) L136. term "symbionts" is not appropriate here.

      We change “symbionts” into “S. marcescens”.

      (11) In Figure 1, the authors used flies of different fitness: weak, strong, and infertile. They should be specific and describe exactly what these terms mean, are these mutants or treatments that affect the fitness?

      We apologize for this missing information and add them in the method and legend. Strong flies (wild-type fly CS), weak flies (yw; Sp/CyO; MKRS/TM6B), infertile flies (dfmr150M null mutant) Figure 1 legend line 849-850.

      (12) Figure S2. The title of this figure is misleading, please modify it. Mechanical force did affect S. marcescens but to a lesser degree as compared to larvae.

      Thank you for your suggestion. We admit that mechanical force affected S. marcescens but to a lesser degree as compared to larvae, so we changed the title to "Biological factors mainly determine S. marcescens lifestyle."

      Reviewer #2 (Recommendations For The Authors):

      General improvement to writing and presentation (see below):

      Describing confluent growth would make more sense than 'slick' and then using descriptions of broken, etc. "colour intensity of the surface slick".

      We used the slick to describe visible surface films of bacteria, which has been used in the previous study (DOI: 10.1038/s43705-023-00307-8). Slick is equal to confluent growth, but seems simple and easy than confluent growth. To make sense, we add this reference to the text.

      We reorganized the text of Figure 1.

      Suggest more specific language to describe observations. For example: Bacterial loading - S. marcescens growth (for example: the presence of dense fly populations reduced Sm growth).

      Thanks for the suggests. We replaced some of them.

      Symbiont, microbiota, microbiome, etc were all used interchangeably throughout the manuscript, but I am not sure I would call Sm part of the indigenous microbiome. Suggest to ensure proper usage and then harmonize throughout the ms.

      We used microbes and microbiome to replace symbiont and microbiota, respectively.

      Details missing from the message and Figure legends that would be helpful (including and especially Figure 7 - what AMP concentration?)

      Thanks for valuable comments. According to this comment, we provided concrete details in the Materials and methods and Figure 7 legend about AMPs, including the source and concentration of AMPs line 487-488, 954-955. Please see the response below.

      L73: define 'these issues" maybe or lead better with the prior sentence, it is not evident as currently written.

      Change "to address these issues" to " To investigate whether and/or how the host modulates bacterial lifestyles,” and merge two paragraphs.

      L74: repetitive sentence with the above.

      Thanks for pointing out this detail. We deleted it.

      L86: naked 'eye'.

      Added.

      L87: what is meant by 'weak flies'?

      Genotypes were added in the updated manuscript. Weak fly stocks display weaker activity and generate fewer eggs than WT flies.

      L96: bacterial load, not loading.

      Corrected.

      L128: no evidence to support, could be reflective of increased numbers in dying/dead larvae that impact total numbers in the vial.

      The number of CFUs of S. marcescens alone was gradually decreased at 96 h post-inoculation. In addition, we observed pale biofilm on the surface of the medium at the late stage. The numbers of CFUs of S. marcescens alone at the later stages were reduced (compared to the peak load at 48 h post-inoculation), so it was deterred that bacteria could undergo ecological suicide. Ecological suicide of the bacterial population was similarly examined by recording the number of CFUs in the medium over time (Ratzke C, et al. Nat Ecol Evol. 2018.). Taken together, we draw a conclusion that bacteria possibly underwent ecological suicide.

      L129: the prior sentence is in contradiction, reduced load only at early time points in the presence of larvae....

      Thanks for pointing out this detail. We added " before 72 h post-inoculation " in the sentence.

      L134: data is only focused on S marcescens, so inferring to 'symbionts' broadly is outside study.

      We change “symbionts” into “S. marcescens”.

      L139: sentence poorly written and confusing.

      We re-organized this sentence.

      To this end, we sought to examine the S. marcescens lifestyle switch from pathogenicity to commensalism by assessing the respective survival of flies on the fly medium that had been processed by single or coexisting S. marcescens.

      L189: evidence for long-term symbiosis is not well established in this paper, suggest editing this language throughout to more specifically reflect what the data supports and leave such interpretations to discussion points and future work.

      Thanks for your valuable advice. We deleted long-term and “thereby promoting the fitness of symbionts in the long maintenance.”.

      L192; used metabolomics to assess the impacts of larvae on bacterial metabolism, as currently written does not make sense.

      We rewrote this sentence. “Next, we investigated whether larvae could further elicit changes in the metabolism of S. marcescens using untargeted metabolomics.”

      L331: the use of monitored here is not correct/odd.

      We changed 'monitored' to 'reshaping’.

      L340: While the authors initially see a cost to Sm in reduced load (CFUs) at 120 h populations associated with larvae become higher - there is also a cost to producing virulence factors, which their RNASeq and metabolomics data support - trade-offs between growth and virulence.

      Thanks for your suggestion. We added “before 72 hours post inoculation” to define the early stage of the bacterial growth in the sentence.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figures 1 A-D: What defines weak and strong flies, and what criteria determine the robustness of flies? How was the experiment conducted? The manuscript lacks details on this matter.

      We thank you for your comments. We lack a criterium, but the robustness of flies comes from daily experience. Weak fly stocks display weak activity and generate fewer eggs than WT flies. Genotypes with different robustness were added in the legend in the updated manuscript

      (2) The authors mentioned, "Noteworthily, the number of CFUs of S. marcescens alone was lower than S. marcescens in co-cultures at the late stage (at 96 h post inoculation), likely that bacteria rapidly exhausted their nutritional resources and underwent ecological suicide." How did they determine that the bacteria exhausted nutritional resources and underwent ecological suicide? One might speculate that larvae could have removed the bacteria simply by consuming them.

      Thanks for this comment. Virtually, there were no larvae inside the vials with single S. marcescens, so bacterial cells were not consumed. However, the numbers of CFUs of S. marcescens alone at the later stages were reduced (compared to the peak load at 48 h post-inoculation), so it was deterred that bacteria could undergo ecological suicide. Ecological suicide of the bacterial population was examined by recording the number of CFUs in the medium over time (Ratzke C, et al. Nat Ecol Evol. 2018.). A similar method was also applied to the number of CFUs of S. marcescens. Taken together, we draw a conclusion that bacteria possibly underwent ecological suicide.

      (3) Figure 2E: The experimental details should be provided in the text. What was the CFU of the bacteria used in this survival experiment?

      We provided further experimental details in the legend line 869-870. The same amount of inocula was used in both single and coculturing S. marcescens.

      (4) The experimental data in Figures 2G and 2H do not sufficiently prove the relationship between the width of the cell wall and virulence, as it lacks experimental validation.

      Previous studies (DOI: 10.1371/journal.ppat.1005946) reveal that glucosylating toxins on the surface are primary virulence determinants, so an increased surface-anchored polysaccharide and protein profile promotes the virulence of the pathogen. Alterations in cell surface (the width of the cell wall) can be examined by TE. Moreover, TE was used to observe changes in the virulence of S. marcescens (DOI: 10.1093/nar/gkab1186). We think that the width of the cell wall could be used to reflect virulence in S. marcescens.

      (5) While it's acknowledged that agitation decreases the color intensity of the bacteria, comparing mechanical agitation with larval crawling seems inappropriate, as the mechanical forces exerted by both methods are not of the same magnitude.

      Thanks for the suggestion. In fact, food was agitated more heavily by glass sticks than by larvae, because larvae merely agitated the surface of food (about 0.5 cm-depth). If the decrease in bacterial load and color was related to the magnitude of agitation, larvae would confer a less decrease (from the decrease in stick agitation) in bacterial load than the sticks. Consequently, it would further support our result that biofactors more importantly confer the inhibition of S. marcescens than force.

      (6) Figure 4D: with this metabolome data, they mentioned, "host suppresses differentiation of S. marcescens into the population with pathogenicity." What evidence supports the claim that downregulation of amino acid metabolism, phosphotransferase system, and ABC transporter directly correlates with decreased pathogenicity?

      Thanks for the comment. Earlier studies showed that amino acid-derived quorum sensing molecules are closely related to bacterial pathogenicity (Defoirdt T. PLoS Pathog. 2019; Wen J, et al. Microbiol Spectr. 2022). Moreover, the phosphotransferase system and ABC transporter can transport and/or produce virulence factors. Therefore, we claimed that downregulation of amino acid metabolism, phosphotransferase system, and ABC transporter directly were related to decreased pathogenicity. To support this claim, we add some references in the updated manuscript line 662-664, 827-830.

      (7) Serotonin: Does serotonin also reduce the virulence of S. marcescens?

      Our primary result showed that serotonin indeed could reduce the virulence of S. marcescens (figure supplement 4), because the survival rate of adult flies was increased and the expression levels of virulence-related genes of S. marcescens alone in the case of serotonin.

      (8) Figures 6D, E, H, I: The expression of key genes should be verified using quantitative real-time polymerase chain reaction (qRT-PCR), as scRNA-seq expression levels might not accurately reflect the true expression levels.

      Bacterial single-cell RNA-seq can evaluate alterations in gene expression in the single-cell resolution. The expression of key genes screened by scRNA-seq was changed only in subpopulations, so the average expression of these genes would be comparable when mixed with a large population. We are afraid that qRT-PCR could be illegible to verify the expression of genes in subpopulations.

      (9) Figure 7: The authors mentioned. "AMPs were supplemented to fly food". However, I could not find information regarding which AMPs and their respective concentrations (i.e., concentration of each AMP) were used in this study. This is a critical aspect of the research; therefore, details should be provided.

      Thanks for your important suggestions. We used the antimicrobial peptide cecropin A, which is produced by silkworms. We provided this information in the methods line 487-488. The concentrations of cecropin A were added in Figure 7 legend.

      (10) Figure 7: Delta AMP + AMP exhibited a stronger effect on the bacteria compared to AMP alone, indicating that immune effectors other than AMP may be involved. Since the IMD pathway is necessary for most immune effectors, including AMP, it would be interesting to test IMD pathway mutant animals and compare them with Delta AMP. Delta AMP + AMP exhibited a stronger effect on the bacteria compared to AMP alone. 

      We appreciate this important question. Indeed, Delta AMP + AMP exhibited a stronger effect on the bacteria compared to AMP alone. We admitted that immune effectors other than AMP may be involved. Alternatively, mechanical force, to a less extent, accounted for the stronger effect on the bacteria (Explained by larvae agitation in figure supplement 2). To rule out this possibility, we examined the effect of total immune effectors on the bacterial load and the prodigiosin yield of S. marcescens using the IMD pathway mutant (RelE20 larvae). Our result showed that the optical density and yield of prodigiosin in Delta AMP group did not significantly differ from the ones in RelE20 group. Moreover, the load of S. marcescens associated with Delta AMP mutant was comparable to that of S. marcescens associated with RelE20 mutant. These results suggested that AMPs play a major role in recapitulating the response of _S. marcescens t_o larvae.

      “To rule out the potential role of other immune effectors, we turned to the IMD pathway mutant RelE20 that is deficient in total immune effectors. Our result showed that the optical density and yield of prodigiosin in RelE20 group did not significantly differ from the ones in DAMP group (figure supplement 7A, B). Moreover, the load of S. marcescens associated with RelE20 mutant was comparable to that of S. marcescens associated with Delta AMP mutant (figure supplement 7C).”

      We now added these results in the text line 326-331.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1:

      Comment 1:

      Summary:

      The authors sought to investigate the associations of age at breast cancer onset with the incidence of myocardial infarction (MI) and heart failure (HF). They employed a secondary data analysis of the UK Biobank. They used descriptive and inferential analysis including Cox proportional hazards models to investigate the associations. Propensity score matching was also used. They found that Among participants with breast cancer, younger onset age was significantly associated with elevated risks of MI (HR=1.36, 95%CI: 1.19 to 1.56, P<0.001) and HF (HR=1.31, 95% CI: 1.18 to 1.46, P<0.001). the reported similar findings after propensity matching.

      Strengths:

      The use of a large dataset is a strength of the study as the study is well-powered to detect differences. Reporting both the unmatched and the propensity-matched estimates was also important for statistical inference.

      Weaknesses:

      Despite the merits of the paper, readers may get confused as to whether authors are referring to “age at breast cancer onset” or “age at breast cancer diagnosis”. I suppose the title refers to the latter, in which case it will be best to be consistent in using “age at breast cancer diagnosis” throughout the manuscripts. I would recommend a revision to the title to make it explicit that the authors are referring to “age at breast cancer diagnosis”.

      Thank you for your nice comments and suggestions. Yes, as you mentioned, in this study, we focused on age at breast cancer diagnosis, which was obtained from the cancer registry data in the UK Biobank and was used in all the analyses. We agree with you that it would be better to consistently use “age at diagnosis of breast cancer” throughout the manuscripts for a better understanding; therefore, we have replaced “age at breast cancer onset” with “age at diagnosis of breast cancer”.

      Change in the manuscript:

      “Age at breast cancer onset” was replaced with “age at diagnosis of breast cancer” in the title and throughout the manuscripts.

      Recommendations For The Authors:

      Kindly review the references for the location of the full stop. Putting the full stop at the end of the parenthesis makes reading smother than its current form as it is difficult to know when the new sentence begins.

      Thank you for your suggestion. We have made revisions to the location of the full stop next to a reference.

      Change in the manuscript:

      The full stop was put at the end of the parenthesis of a reference throughout the manuscripts.

      Response to Reviewer #2:

      Comment 1:

      This is a well-presented large analysis from the UK Biobank of nearly 250,000 female adults. The authors examined the associations of breast cancer diagnosis with incident myocardial infarction and heart failure by different onset age groups. Based on results from a series of statistical analyses, the authors concluded that younger onset age of breast cancer was associated with myocardial infarction and heart failure, highlighting the necessity of careful monitoring of cardiovascular status in women diagnosed with breast cancer, especially those younger ones.

      Comments to consider:

      It’s thoughtful for the authors to have included and adjusted for menopausal status, breast cancer surgery, and hormone replacement therapy in their sensitivity analysis. It would be informative if the authors presented the number and percentages of menopause and cancer treatments.

      Thank you for your comments. As suggested, we have provided more detailed information on the number and percentage of menopausal status and breast cancer treatments.

      Change in the manuscript:

      Page 11, Lines 208 to 211: added “Among participants with breast cancer, 11 460 (70.6%) participants were postmenopausal, 14 255 (87.6%) participants had undergone breast cancer surgery, and 6 784 (41.8%) participants had received hormone replacement therapy.”

      Change in the supplementary material:

      The number and percentage of menopausal status, breast cancer surgery, and hormone replacement therapy were added to Table S13.

      aAdjusted for age, ethnicity, education, current smoking, current drinking, obesity, exercise, low-density lipoprotein cholesterol, depressed mood, hypertension, diabetes, antihypertensive drug use, antidiabetic drug use, statin use, menopausal status, breast cancer surgery, and hormone replacement therapy.

      HR, hazard ratio; CI, confidence interval.

      Comment 2:

      The analytical baseline used for follow-up should be pointed out in the methods section. It’s confusing whether the analytic baseline was defined as the study baseline or the time at breast cancer diagnosis.

      We apologize for the confusion. In this study, the analytical baseline used for follow-up was defined as the baseline of UK Biobank (2006-2010) and we have pointed it out in the methods section as suggested.

      Change in the manuscript:

      Page 9, Lines 165 to 166: added: “The analytical baseline used for follow-up was defined as the baseline of UK Biobank (2006-2010).”

      Comment 3:

      Did the older onset age group have a longer follow-up duration? Could the authors provide information on the length of follow-up by age of onset in Supplementary Table S4? It would give the readers more information regarding different age groups.

      Thank you for your question. We compared the time of follow-up among the three diagnosis age groups and found that although the durations of follow-up among the three groups were quite similar (as shown in Table S4), statistical analysis revealed a significant difference with the older diagnosis age group demonstrating a longer follow-up duration (P for Kruskal-Wallis test <0.001). This is understandable as with large sample sizes, even a slight difference could lead to statistical significance. According to your suggestion, we have added information on the length of follow-up by age of diagnosis in Supplementary Table S4.

      Change in the supplementary material:

      Added the median and interquartile range of follow-up in Supplementary Table S4.

      The results are presented as the mean ± standard deviation, or No. (%).

      aThe effect sizes are standardized mean differences for continuous outcomes and the Phi coefficient for dichotomous outcomes.

      LDL-C, low-density lipoprotein cholesterol.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study addresses the temporal patterning of a specific Drosophila CNS neuroblast lineage, focusing on its larval development. They find that a temporal cascade, involving the Imp and Syb genes changes the fate of one daughter cell/branch, from glioblast (GB) to programmed cell death (PCD), as well as gates the decommissioning of the NB at the end of neurogenesis.

      I believe there are some inaccuracies in this summary. We address temporal patterning during larval and pupal stages until the adult stage. The Imp and Syp genes change the fate of one daughter cell/branch from survival to programmed cell death (PCD). The change from glioblast (GB) to PCD, which occurs at an early time point, is not addressed here. The main point of the paper is missing:

      • Last-born MNs undergo apoptosis due to their failure to express a functional TF code, and this code is post-transcriptionally regulated by the opposite expression of Imp and Syp in immature MNs.

      Reviewer #2 (Public Review):

      Summary:

      Guan and colleagues address the question of how a single neuroblast produces a defined number of progeny, and what influences its decommissioning. The focus of the experiments are two well-studied RNA-binding proteins: Imp and Syp. The Authors find that these factors play an important role in determining the number of neurons in their preferred model system of VNC motor neurons coming from a single lineage (LinA/15) by separate functions taking place at specific stages of development of this lineage: influencing the life-span of the LinA neuroblast to control its timely decommissioning and functioning in the Late-born post-mitotic neurons to influence cell death after the appropriate number of progeny is generated. The post-mitotic role of Imp/Syp in regulating programmed-cell death (PCD) is also correlated with a specific code of key transcription factors that are suspected to influence neuronal identity, linking the fate of neuronal survival with its specification. This paper addresses a wide scope of phenotypes related to the same factors, thus providing an intriguing demonstration of how the nervous system is constructed by context-specific changes in key developmental regulators.

      The bulk of conclusions drawn by the authors are supported by careful experimental evidence, and the findings are a useful addition to an important topic in developmental neuroscience.

      I cannot summarize better the paper.

      Strengths:

      A major strength is the use of a genetic labeling tool that allows the authors to specifically analyze and manipulate one neuronal lineage. This allows for simultaneous study of both the progenitors and post-mitotic progeny. As a result the paper conveys a lot of useful information for this particular neuronal lineage. Furthermore addressing the association of cell fate specification, taking advantage of this lab's extensive prior work in the system, with developmentally-regulated programmed celldeath is an important contribution to the field.

      Beyond Imp/Syp, additional characterization of this model system is provided in characterizing a previously unrecognized death of a hemilineage in early-born neurons.

      Thanks!

      Weaknesses:

      The main observations that distinguish this study from others that have investigated Imp/Syp in the fly nervous system is the role played in late-born post-mitotic neurons to regulate programmed cell death. This is an important and plausible (based on the presented findings) newly discovered role for these proteins. However the precision of experiments is not particularly strong, which limits the authors claims. The genetic strategy used to manipulate Imp/Syp or the TF code appears to be done throughout the entire lineage, or all neuronal progeny, and not restricted to only the late born cells. Can the authors rule out survival of the early born hemi-lineage normally fated to die? Therefore statements such as this: 

      To further investigate this possibility, we used the MARCM technique to change the TF code of lastborn MNs without affecting the expression of Imp and Syp should be qualified to specify that the result is obtained by misexpressing these factors throughout the entire lineage.

      We agree that our genetic manipulations affect the entire lineage or all neuronal progeny. We do not have genetic tools to gain such precision. We have changed our descriptions to specify the entire lineage or all neuronal progeny. As the reviewer raised, we were also concerned about the possibility that the overexpression of Imp or knockdown of Syp could induce the survival of the early-born hemilineage. We have two experiments that rule out this possibility:

      (1) In late LL3 larvae, Imp OE or syp MARCM clones do not change the number of cells in LL3 larvae (see Guan et al., 2022), indicating that the hemilineage that died by PCD is not affected. If Imp or Syp played a role in the survival of the hemilineage, we would see at least a 50% increase in the number of MNs at this stage.

      (2) The MARCM experiment using the VGlut driver to overexpress P35 or Imp allows us to manipulate only elav+ VGlut+ neurons. The hemilineage removed by PCD is elav- VGlut- and is not affected by this experiment. Consequently, the increase in MNs in adults with genetic manipulation can only be the result of the survival of the other hemilineage (elav+, VGlut+). Moreover, this experiment shows an increase in the number of neurons in the adult but not in LL3, demonstrating that the hemilineage (elav- VGlut-) is still removed by PCD with this genetic manipulation.

      The authors make an observation that differs from other systems in which Imp/Syp have been studied: that the expression of the two proteins appears to be independent and not influenced by cross-regulation. However there is a lack of investigation as to what effect this may have on how Imp/Syp regulate temporal identity. A key implication of the previously observed cross-regulation in the fly mushroom body is that the ratio of Imp/Syp could change over the life of the NB which would permit different neuronal identities. Without cross-regulation, do the authors still observe a gradient in the expression pattern of time? Because the data is presented with Imp and Syp stained in different brain samples, and without quantification across different stages, this is unclear. The authors use the term 'gradient' but changes in levels of these factors are not evident from the presented data.

      We have now quantified the transcriptional activity of Imp and Syp in the NB over time using smFISH. We have also quantified the relative expression of Imp and Syp protein in the NB over time by co-immunostaining. Additionally, we quantified the relative expression of Imp and Syp protein in postmitotic neurons as a function of their birth order in late LL3 larvae. All these data show an opposite temporal gradient of Imp and Syp in the NB and an opposite spatial gradient in immature neurons according to their birth order (Figure. 4). How these gradients are established in our system remains to be elucidated. 

      Reviewer #3 (Public Review):

      This study by Guan and co-workers focuses on a model neuronal lineage in the developing Drosophila nervous system, revealing interesting aspects about: a) the generation of supernumerary cells, later destined for apoptosis; and, b) new insights into the mechanisms that regulate this process. The two RNA-binding proteins, Imp and Syp, are shown to be expressed in temporally largely complementary patterns, their expression defining early vs later born neurons in this lineage, and thus also regulating the apoptotic elimination. Moreover, neuronal 'fate' transcription factors that are downstream of Imp and signatures of early-born neurons, can also be sufficient to convert later born cells to an earlier 'fate', including survival.

      The authors provide solid evidence for most of their statements, including the temporal windows during which the early and the later-born motoneurons are generated by this model lineage, how this relates to patterns of cell death by apoptosis and that mis-expression of early-born transcription factors in later-born cells can be sufficient to block apoptosis (part of, and perhaps indicative of the late-born identity).

      Other studies have previously outlined analogous, mutually antagonistic roles for Imp and Syp during nervous system development in Drosophila, in different parts and at different stages, with which the working model of this study aligns.

      Overall, this study adds to and extends current working models and evidence on the developmental mechanisms that underlie temporal cell fate decisions.

      I cannot summarize better the paper.

      Reviewer #1 (Recommendations For The Authors):

      While this is an interesting topic, I raised two issues in my original review.

      (1) Against the backdrop of numerous previous studies linking many developmental regulators, including tTFs, to programmed cell death in the developing CNS, which in several cases have involved identifying key PCD genes and decoding the molecular regulatory interplay between regulators and PCD genes, this study does not provide any new insight into the regulation of developmental PCD in the CNS.

      The authors have not added any new data to address this shortcoming.

      I agree with the reviewer that we did not attempt to link Imp/Syp with the temporal transcription factor (tTF) cascade or spatial selectors such as Hox genes. However, this decision was intentional as our primary focus was on studying immature MNs. It is worth noting that the decommissioning of NBs by autophagic cell death or terminal differentiation, which is mediated by Imp/Syp in other lineages, has not been correlated with tTFs or spatial selectors. Although we have not directly examined the involvement of the hb + sv > kr > pdm > cas > cas-svp > Grh cascade in the decommissioning of the Lin A neuroblast, our preliminary data indicate that Hb, Sv, Pdm, and Cas are not expressed in the Lin A NB, while Grh is consistently expressed in the NB (Wenyue et al., 2022). Thus, it is less likely that this particular tTF cascade is not implicated in Lin A neuroblast decommissioning. In contrast, spatial selectors, such as the Hox gene Antp, play an opposing role compared to HOX transcription factors in abdominal NBs. In the Lin A lineage, Antp promotes survival (Baek, Enriquez, & Mann, 2013). Here, to avoid repeating what has already been described in the literature, we focused on the role of Imp/Syp in postmitotic neurons and revealed that the precise elimination of MNs is linked to the control of TFs expressed in the MNs.

      (2) I raised the issue that it is unclear if Imp/Syp acts in the NB, and/or in IMC/GMC, and/or in the daughter cells generated from these.

      I agree with the reviewer's concern regarding the unclear function of Imp/Syp, i.e., whether it acts in the NB, IMC/GMC, or daughter cells. To address this, one possible approach would be to attempt rescuing Imp and Syp mutants by transgenic expression in specific cell types, such as NBs, IMC/GMC, or GB/daughter cells. However, we have not conducted such experiments as we were skeptical about the outcome. Previous published work has used drivers expressed in NBs, IMC/GMC, or postmitotic neurons to decipher the function of a gene in a specific cell type. But the results of these experiments must be taken with caution. Using NB/GMC drivers to study gene function can lead to effects not only in the NB but also in its progeny, including GMC or postmitotic neurons, due to the perdurance and stability of the Gal4 and UAS-gene expression system. For instance, dpn-Gal4 UASGFP not only labels the NB but also many of its progeny, even if Dpn is only expressed in NBs. And elav-Gal4 is expressed in the NB and GMCs.

      However, our overexpression of Imp in immature neurons using Vglut demonstrates that Imp promotes cell survival through an autonomous function in these neurons. This driver is only expressed in postmitotic neurons (elav+) and not in the NB, IMC/GMC, or in the hemilineage eliminated by cell death (elav-vglut-).

      Reviewer #2 (Recommendations For The Authors):

      Oddly knockdown of Imp in the neuroblast (Fig. 5D) only led to death at 8h APF, when Imp is no longer expressed. Do the authors have an explanation as to how the stem cell can survive until this point? A discussion would be helpful.

      The simple explanation is the efficiency of RNAi. The imp-/- MARCM clones (Guan et al., 2022) lead to a stronger reduction of MNs in LL3.

      A simple experiment I would recommend is to repeat the antibody stainings of staged larvae/pupae (Fig. 4) having the anti-Imp/Syp antibodies in the same brain sample, and perhaps a quantification of the ratio in the NB. Given the species in which the ABs were raised seem compatible, this should be feasible. As it stands now, there is no indication of whether the ratio of Imp vs Syp change over time.

      We have now quantified the transcriptional activity of Imp and Syp in the NB over time. We have also quantified the relative expression of Imp and Syp proteins in the NB over time and quantified the relative expression of Imp and Syp proteins in postmitotic neurons as a function of their birth in late LL3 larvae. How these gradients are established in our system still remains to be 

      Minor errors/suggestions:

      Fig 4. Time legend at the top goes A, B, C, E, F (no D). So it doesn't match the panels below

      Yes, we have made the corrections.

      Sentence repeated in Intro:

      The process of terminating NB neurogenesis through autophagic cell death or terminal differentiation is commonly referred to as decommissioning.

      Yes, corrections have been made.

      IN FIGURE 1 THEY SAY 'TYPE IB' AND IN FIGURE 2 THEY SAY 'TYPE 1B'

      We have changed it to type 1b.

      In Fig2A-It's hard to see lack of Elav and Fig2G-It's hard to see presence of Dcp1. Panels could be adjusted to emphasize these results

      We have increased the size of the panels and made two separate panels where only the elav and Dcp1 signals are present.

      Observations that the result is equivalent in all thoracic segments is expected, since all legs need the same number of neurons. This is nice to have but can be in the supplement.

      Overall the figure number seems excessive, especially considering much of the results included(particularly the NB results) are findings consistent with previous papers and some is characterization of the system that does not fit well with the main focus regarding Imp/Syp (i.e death of one hemi-lineage:

      Figure 5 and 6 can be joined as one.

      We have combined Figures 5 and 6, showing only the T1 segments.

      There is some discrepancy between graphs Fig7F and K: At LL3 the number of neurons is different for the control in 7F and the count in K

      Yes, because the genetic backgrounds are not the same and we are not counting the same type of cells. In 7F, we are counting the elav+ and VGlut+ cells, whereas in Figure 7K, we are counting all the elav+ in Lin A, including those elav+ VGlut-. VGlut expression arrives a bit later after elav+, which is why we have fewer elav+ cells in 7F. In other words, VGlut MARCM clones do not label all Lin A elav+ cells. I have clarified this in the figure.

      Reviewer #3 (Recommendations For The Authors):

      Main comment: on the notion of Imp and Syp gradients:

      p. 5, related to figure 4 - there are clearly distinct windows for predominantly (if not exclusively) Imp, and later, Syp expression in lineage 15, with a phase of co-expression.

      However, based on the data shown, it is unclear whether these windows represent gradients, as repeatedly stated. If the notion of gradients is derived from other studies, on other lineages, then this would be good to clarify. Alternatively, the idea of temporally opposing gradients of Imp and Syp would need to be demonstrated for this lineage.

      For example, a more accurate way to describe this study's data is given on p.7 "In conclusion, our findings demonstrate that the opposite expression pattern of Imp and Syp in postmitotic neurons precisely shapes the size of Lin A/15 lineage by controlling the pattern of PCD in immature MNs (Fig. 8)."

      We have now quantified the transcriptional activity of Imp and Syp in the NB over time. We have also quantified the relative expression of Imp and Syp proteins in the NB over time. We have also quantified the relative expression of Imp and Syp proteins in postmitotic neurons as a function of their birth in late LL3 larvae. How these gradients are established in our system still remains to be identified.

      Minor points:

      p.6, related to figure 7: Are numbers of EDU- early born and EDU+, late born, MNs expressed as means in the main text? As written, it suggests absence of any variability, which one would expect and which is shown in Fig.7 data.

      Yes, we have added averages in the text.

      Methods: the author name 'Lacin' has been mis-spelled

      Sorry about that, it's been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper proposes a valuable new method for the assessment of the mean kurtosis for diffusional kurtosis imaging by utilizing a recently introduced sub-diffusion model. The evidence supporting the claims that this technique is robust and accurate in brain imaging is incomplete. The work could be of interest in the research and clinical arena.

      We thank the editors for their assessment and the reviewers for their careful reading and feedback that helped to improve the manuscript. We have addressed all the reviewers’ concerns and would like to request an update of the assessment to reflect the revisions we have made.

      Below, we address the reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study introduces an innovative method for assessing the mean kurtosis, utilizing the mathematical foundation of the sub-diffusion framework. In particular, a new fitting technique that incorporates two different diffusion times is proposed to estimate the parameters of the sub-diffusion model. The evaluation of this technique, which generates kurtosis maps based on the sub-diffusion framework, is conducted through simulations and the examination of data obtained from human subjects.

      We thank Reviewer #1 for pointing out the novelty and innovation of our work.

      Strengths:

      The utilization of the sub-diffusion model for tissue characterization is a significant conceptual advancement for the field of diffusion MRI. This study adeptly harnesses this approach for an accurate estimation of the parameters of the widely employed diffusion model, DKI, leveraging their established analytical interconnection as evidenced in prior research. Notably, this approach not only proposes a robust, fast, and accurate technique for DKI parameter estimation but also underscores the viability of deploying the sub-diffusion model for tissue characterization, substantiated by both simulated and human subject analyses. The paper is very-well written; well-organized; and coherent. The simulation study included different aspects of water diffusion as captured by diffusion-weighted MRI such as varying diffusion times and different b-value subpopulations, resulting in a comprehensive and thorough discussion.

      We thank Reviewer #1 for highlighting the the strengths of our work.

      Weaknesses:

      The primary objective of this study is to demonstrate a robust approach for estimating DKI parameters by directly calculating them using the parameters of the sub-diffusion model. This premise, however, relies on the assumption that the sub-diffusion model effectively characterizes the diffusion MRI signal and that its parameters are both robust and accurate. Throughout the manuscript, the term "ground truth kurtosis K" is frequently used to denote the "true K" value in the context of the simulation study. Nonetheless, given that the data is simulated using the new sub-diffusion model - an approximation of the DKI-based signal expression- this value cannot truly be considered the "ground truth K". The simulation study highlights the robustness and accuracy of D* and K*, but it inherently operates under the assumption that the observed data is in the form of the sub-diffusion model.

      It is correct that our study operates under the assumption that the observed data is in the form of the sub-diffusion model, and indeed one of the key outcomes of this work is to demonstrate the effectiveness of that assumption and the new possibilities it brings. Naturally, using any mathematical model at all carries assumptions. Over the past two decades, many mathematical and biophysical models have been proposed to characterise diffusion MRI signals. However, model validation remains an open challenge in the field. In this, as well as in our previous work (Yang et al, NeuroImage, 2022), we have shown that our proposed sub-diffusion model not only provides a much better fitting compared to the traditional DKI method, overcoming the major limitation of the traditional DKI method on the maximum b-value, but also generates brain maps with superior tissue contrast and elucidates previously unseen structure.

      We have replaced the term “ground truth kurtosis K” with “true kurtosis K”.

      The comment “… using the new sub-diffusion model – an approximation of the DKI-based signal expression…” is a bit misleading. In fact we propose that the reverse interpretation is the more suitable way to view the relationship: the DKI model is a degree-2 approximation of the sub-diffusion model, as in eq. (7).

      Reviewer #2 (Public Review):

      Summary: The authors present a technique for fitting diffusion magnetic resonance images (dMRI) to a sub-diffusion model of the diffusion process within brain imaging. The authors suggest that their technique provides robust and accurate calculation of diffusional kurtosis imaging parameters from which high quality images can be calculated from short dMRI data acquisitions at two diffusion times.

      Strengths: If the authors can show that the dMRI signal in brain tissue follows a sub-diffusion model decay curve then their technique for accurately and robustly calculating diffusional kurtosis parameters from multiple diffusion times would be of benefit for tissue microstructural imaging in research and clinical arenas.

      In Figure 7, we showed that the diffusion MRI signals follow the sub-diffusion model decay curves.

      Weaknesses: The applied sub-diffusion model has two parameters that are invariant to diffusion time, D_β and β which are used to calculate the diffusional kurtosis measures of a diffusion time dependent D* and a diffusion time invariant K*. However, the authors do not demonstrate that the D_β, β and K* parameters are invariant to diffusion time in brain tissue.

      In our proposed sub-diffusion model, D_β and β are assumed to be time-independent parameters, which is a key strength of the approach. The goal is to characterise tissue-specific properties (D_β for diffusivity and β for the extent of tissue complexity) that do not rely on the diffusion time setting in diffusion MRI experiments. To extract such time-independent properties, we proposed a new sampling and fitting strategy – fitting at least two diffusion time data together.

      The authors' results visually show that there is time dependence of the K* measure (in Figure 6) that is more apparent in white matter with K* values being higher for diffusion times of ∆=49 ms than ∆ = 19 ms. The diffusion time dependence of K* indicates there is also diffusion time dependence of β.

      The discrepancies in the fitted K* for ∆ = 19 ms and ∆ = 49 ms separately do not necessarily imply that there is a true time dependence in these parameters. Rather, this can be explained by a deficiency of data when fitting a two-dimensional surface (S is a function of q and ∆) based on data along a single curve for a fixed value of ∆.  Without properly sampling the surface across two independent coordinates, one cannot expect a fully reliable fit.  Indeed, a great advantage of our proposed method is to allow fitting data with multiple values of ∆, and thereby getting a richer data set with which to fit the full signal surface S(q, ∆).  The results for fitting ∆ = 19 ms and ∆= 49 ms data together clearly show the benefits of this approach, with superior contrast achieved.

      Furthermore, Figure 7 shows that there is a tissue specific root mean squared error in model fitting over the two diffusion times which indicates greater deviation from the model fit in white matter than grey matter.

      Although the errors are not completely tissue-independent, please note the magnitude of the RMSE is very small. The quality of the fitting in both white and grey matter is shown in sub-figures (A)-(H) for several representative voxels.

      To show that the sub-diffusion model is robust and accurate (and consequently that K* is robust and accurate) the authors would have to demonstrate that there is no diffusion time-dependence in both D_β and β in application to brain imaging data for each diffusion time separately. Simulated data should not be used to demonstrate the robustness and accuracy of the sub-diffusion model or to determine optimization of dMRI acquisition parameters without first demonstrating that D_β and β are invariant to diffusion time. This is because simulated signals calculated by using the sub-diffusion characteristic equation of dMRI signal decay will necessarily have diffusion time invariant D_β and β parameters. Without further information demonstrating diffusion time invariance of D_β, β and K* it is not possible to determine whether the authors have achieved their aims or that their results support their conclusions.

      First, as explained above, the dMRI signal S is a function of q and ∆, i.e., a two-dimensional surface S(q, ∆), and hence fitting data sampled from single diffusion time (i.e., one curve on the surface) cannot provide reliable parameters, as seen in the discrepancies in K* in Figure 6 (bottom two rows). Our proposed new sampling and fitting strategy overcomes this issue. That is, to obtain a reliable fitting, one should fit data from at least two diffusion times together (i.e., sampling data from at least two curves on the signal surface).

      Second, to demonstrate that D_β and β are time invariant, one would require data at several diffusion times with high b values. Such data cannot be easily obtained. The data used in this current study is the MGH Connectome 1.0 human brain data, which only contains two diffusion times, ∆ = 19 ms and ∆ = 49 ms.

      Hence, we conducted numerical experiments to demonstrate our idea. In Figure 3, we showed that (i) the variability of the fitted parameters is significantly reduced when moving from fitting single diffusion time data to two diffusion time data, and (ii) the difference in fitting three diffusion times compared to two is very minor, indicating convergence towards the correct time-independent parameter values. The results from fitting human brain data (Figure 6 and Tables 2-4) agree with the expectations from our numerical experiments. Hence, we believe that we have provided sufficient evidence to support our proposed sub-diffusion model and its optimal fitting strategy.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is clear that the authors preferred generating the data by using sub-diffusion model's signal expression as it has many benefits, such as allowing different diffusion times to be incorporated, and hence investigation of the effect of the number of diffusion times on the accuracy of the parameter fitting. I recommend adding another simulation study by generating the data with the DKI model expression (as the goal of the study is to provide an accurate mapping of diffusional mean kurtosis), fitting the data to the sub-diffusion model's expression in Eq. (10), and then calculating K* and D* by Eqs. (8) and (9) only for a fixed diffusion time and one b-value subset.

      We appreciate the suggestion. However, unfortunately it is not appropriate to generate data with the DKI model, as the maximum b-value is limited to 2000~3000s/mm^2 and hence the DKI model cannot represent diffusion MRI signals from a full spectrum of b-values. A key strength of our proposed model is that it removes this limitation.

      There is a typo on Page 24, Line 581; "b<=2400" should be b>=2400.

      We have fixed this typo.

      Reviewer #2 (Recommendations For The Authors):

      As the authors state the sub-diffusion model has two parameters, D_β and β that are invariant to diffusion time, and give rise to a time-varying diffusion coefficient in mm^2s^-1 and a time invariant kurtosis. However, there is a need to be clearer and more specific about the implications of the sub-diffusion model. The manuscript would be improved by the authors:

      (a) Defining the time-varying diffusion coefficient that arises from the model, its functional form and properties.

      We refer Reviewer#2 to eq.(5) and eq.(8) for the definition of time-varying diffusion coefficients D* and D_SUB and their relationship.

      (b) Clearly discuss the implications of this with respect to other time-varying diffusion coefficient methods in the current literature.

      We refer Reviewer#2 to the section “Time-dependence of diffusivity and kurtosis” under “Discussions”.

      (c) Demonstrating that D_β and β do not vary with diffusion time when estimated from dMRI acquired on human participants.

      We have addressed this comment in the public review.

      The manuscript would benefit from increases in clarity in all sections and the authors identifying typographical errors.

      We have updated the relevant text in the revised manuscript to make it clearer, including fixing typos.

      Specific improvements to clarity in the methods and results section would include:

      Line 620: Why were parameter approximations for model fitting to simulated data restricted to the ranges D_β∈[10^(-4),10^(-3) ] and β∈[0.5,1] but in fitting to brain imaging data the ranges were D_β>0 and 0<β<=1.

      The parameter ranges for model fitting to both the simulated and human data were set to the same: D_β>0 and 0<β<=1. To generate simulated data, D_β and β ranges were restricted to reflect observations in human brain data. We have updated the text to make this clearer.

      Lines 622, 628 & 629: Which goodness of fit measure was used?

      The goodness of fit measure for all simulated results is the coefficient of determination, or R^2 value, as noted in the “Goodness-of-fit and region-based statistical analysis” section under Methods. We have updated the text to make this clearer.

      Line 666: The method for computation of R^2 within the coefficient of determination should be stated as there are several ways of calculating an R^2 value.

      The formula for computing R^2 has been added to the text.

      Line 685: A t-test is mentioned but it is not clear as to the inputs to this test, or where the results of this analysis are presented.

      We have updated the text to make this clearer. The results of this analysis are presented in Table 5. The entries identified in italic under the optimal b-value heading were found to be significantly different from the benchmark mean K* reported in Table 2.

      Line 696: It is not clear how the intra-class correlation coefficient histograms are computed from six subjects. This applies to results in Figure 10 that require greater clarity in the description.

      The formula for computing the intra-class correlation coefficient has been added to the sub-section “Scan-rescan analysis using intraclass correlation coefficient (ICC)” under “Methods”.

      It would be helpful if the authors primarily report results pertaining to the model parameters D_β and β. This is because D* and K* are calculated from D_β and β. Conditions for robust and accurate estimation of D_β and β will provide robust and accurate measures for D* and K*.

      Two new tables for the model parameters D_β and β have been added. Please see Tables 3 and 4 in the revised manuscript.

      The authors state that fitted model parameters are not affected by maximum b-value (paragraph beginning line 366). This statement is based on their model simulation results. Could the authors provide data to support this based on the application of their model to the human brain imaging data?

      We would like to clarify that our statement is indeed based on human brain imaging. As stated in the paragraph beginning line 366, both results in Table 2 (using full dataset) and Table 5 (using dataset with optimal b-value sampling) are generated from the Connectome human brain data. If maximum b-value dependence is present, benchmark (Table 2) versus optimal region-specific results (Table 5, or previously Table 3) should show some systematic difference.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      The study provides a wealth of interesting observations of behavior and much of this data constitutes a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth considering and exploring further.

      After the initial reviewers' comments, the authors performed a welcome revision of the way the results are presented. Overall the study has been improved by the revision. However, one piece of new data is perplexing to me. The new Figure 7 presents the results of a model analysis of the strength of the EI caused by a second fish to localize when the focal fish is chirping. From my understanding of this type of model, EOD frequency is not a parameter in the model since it evaluates the strength of the field at a given point in time. Therefore the only thing that matters is the phase relationship and strength of the EOD. Assuming that the second fish's EOD is kept constant and the phases relationship is also the same, the only difference during a chirp that could affect the result of the calculation is the potential decrease in EOD amplitude during the chirp. It is indeed logical that if the focal fish decreased its EOD amplitude the target fish's EOD becomes relatively stronger. Where things are harder to understand is why the different types of chirps (e.g. type 1 vs type 2) lead to the same increase in signal even though they are typically associated with different levels of amplitude modulations. Also, it is hard to imagine that a type 2 chirps that is barely associated with any decrease in EOD amplitude (0-10% maybe), would cause doubling of the EI strength. There might be something I don't understand but the authors should provide a lot more details on how this result is obtained and convince us that it makes sense.

      We thank the author for the comments and we agree that the approach could have been better detailed. As anticipated by the Reviewer, the Boundary Element Method (BEM) model can be used simply to calculate the electric field and electric image at a specific point in time (instantaneously), regardless of EOD frequency. However, our model allows for the concatenation of consecutive instants and thus is able to render an entire sequence of electric fields - and resulting electric images - incorporating realistic EOD characteristics such as shape, duration, and frequencies (see Pedraja et al., 2014).

      Chirp-triggered EIs were modeled using real chirps produced by interacting fish. Each chirp was thus associated to its duration and peak parameters, as well as the fish positional information (distance and angle). 

      However, since we did not know the beat phase at which chirps were produced, we computed electric images for each fish position and chirp scenario by simulating various phases (here referred to the initial offset of the two EODs, set at 4 phases, equally spaced). These are intended as phases of the sender EOD and simply refer to the initial OFFSET between the two interacting EODs. However, since our simulations were run over a time window of 500 msec, all phases are likely to be covered, with a different temporal order relative to the chirp (always centered within the 500 msec).

      The simulation was run maintaining consistent timing for both chirp and non-chirp conditions, across approximately 800 body nodes. At each node, the current flow was calculated from the peak-to-peak of the EOD sum (i.e. the point-to-point of the difference between the beat positive and negative envelopes). Analyzing the EIs over this fixed time window enables us to assess the unitary changes of current flow induced by chirps over units of time (ΔI/Δt). From this, we can calculate a cumulative sum of current flow changes - expressed as delta(EI) and use it to show the effect of the chirps on the spatiotemporal EI (Figure 7C).

      One can express this cumulative change mapped onto the fish body (keeping the 800 points separated, as in Figure 7C) or further sum the current changes to obtain a single total (as shown in Figure 7D).

      One can check this by considering that a sum for example of a set of 500/800 points - judging from the size of the blue areas in C not all 800 points have a detectable change - each valued 0.1-to-0.3 mA/s, one could get circa 100 mA/s, which is what is shown in D. (is this what is happening ?)

      We do not know why chirps of different types triggered similar effects. It is possible that, since EI measurements are pooled over several chirps produced at different angles and distances, in case of a lower amount of chirps considered for a given type (as in the case of rises, very low) these measurements may not highlight more marked differences among types. In a publication we are currently working on, we are considering a larger dataset to better assess these results.

      The methods section has been edited to clarify the approach (not yet).

      Reviewer #2 (Public Review):

      Studying Apteronotus leptorhynchus (the weakly electric brown ghost knifefish), the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. Chirping is a behavior that has been well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation.

      Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that could have a great impact on the field.

      We thank the Reviewer for the extensive and constructive comments. We would like to add that, while it is true that many detailed studies have been published on the anatomy and physiology of the circuits implicated in the production and modulation of “electric chirps”, most of this  research assumed, and focused exclusively on, their possible role in communication.  In addition, most behavioral studies did the same and a meta-analysis of the existing literature on chirping allows to trace back the communication idea mainly to two studies: Hagedorn and Heiligenberg, 1985 (“Court and spark: electric signals in the courtship and mating of gymnotoid fish”) and Hopkins, 1974 (“Electric Communication: Functions in the Social Behavior of Eigenmannia Virescens”), among the main sources. Importantly, in these studies only contextual observations have been made (no playback experiment or other attempts to analyze more quantitatively the correlation of chirping with other behaviors).

      The authors do provide convincing evidence that chirps may function in homeoactive sensing. However, their evidence arguing against a role for chirps in communication is not as strong, and fails to sufficiently consider the evidence from a large body of existing research. Ultimately, the manuscript presents very interesting data that is sure to stimulate discussion and follow-up studies, but it suffers from dismissing evidence in support of, or consistent with, a communicative function for chirps.

      Although the tone of some statements present in our earlier draft may suggest otherwise, through our revisions, we have made an effort to clarify that we do not intend to dismiss a function of chirps in communication, we only intend to debate and discuss valid alternative hypothesis, advanced from reasonable considerations.

      Before writing this manuscript, we have attempted to survey  literally all the existing literature on chirps (including studies focused on behavior, peripheral sensory physiology as well as brain physiology). Although it is not unlikely that some studies have eluded our attention, an effort for a comprehensive review was made. Based on this survey we realized that none of the studies provided a clear  and  unambiguous piece of evidence to support the communication hypothesis (we refer here to the weak points highlighted in the discussion and mentioned in the previous comment). Which in fact does not come without its weak points and contradictions (see later comments).

      It follows a summary of the mentions made to the communication theory in the different section of the manuscript including several edits we have applied in response to the Reviewer’s concern:

      In the abstract we clearly state that we are considering an alternative that is only hypothetically complementary, not for sure.  Nonetheless, we have identified a couple of instances that could sound dismissive of the “communication hypothesis” in the following section.

      In the introduction we write in fact about the possibility of interference between communication signals and conspecific electrolocation cues, as they are both detected as beat perturbations. We did not mean to use “Interference” here as “reciprocal canceling”, rather we intended it as “partial or more or less conspicuous overlap” in the responses triggered in electroreceptors.

      Hoping to convey a clearer message, we have edited the related statement and changed it to “both types of information are likely to overlap and interact in highly variable ways”.

      We have also removed the statement: “According to this idea, beats and chirps are not only detected through the same input channel, but also used for the same purpose.” as at this point in the manuscript it may be too strong.

      In the results section we do not include statements that might be seen as dismissive of the communication hypothesis but only statements in support of the “probing with chirps” idea (which is the central hypothesis of the study).

      In the discussion paragraphs we elaborate on why the current functional view is either flawed or incomplete (first paragraph “existing functional hypotheses''). Namely: 1)  multiple triggering factors implied in chirp responses covary and need to be disentangled (example DF/ sex), 2) findings on brown ghosts and a few other gymnotiforms have been used to advance the hypothesis of “communication through chirps'' in all weakly electric fish (including pulse species). 3) social encounters - in which chirps are recorded - imply also other behaviors (such as probing) which have not been considered so far. This point is related to the first one on covariates. 4) most studies referring to big chirps as courtship chirps were not done in reproductive animals (added now)  and 5) no causal evidence has been provided so far to justify a role of chirps in social communication.

      We are discussing these points as challenges to the communication hypothesis, not to dismiss the hypothesis, but rather to motivate future studies addressing these challenges.

      We do not want to appear dismissive of the communication hypothesis and had therefore previously edited the manuscript to avoid the impression of exclusivity of the probing hypothesis. We have now gone over the manuscript once more and edited several sentences. Nevertheless, we want to point out again that - despite the large consensus - the communication hypothesis has, until now, never been investigated with the kind of rigor applied here.

      The authors do acknowledge that chirps could function as both a communication and homeactive sensing signal, but it seems clear they wish to argue against the former and for the latter, and the evidence is not yet there to support this.

      In both rounds of revision we have made an effort to convey a more inclusive interpretation of our findings. We tried our best to express our ideas as hypothetical, not as proof that communication through chirps does not exist. The aim of this study is to propose an alternative view, and this cannot be done without underlining the weak points of an existing hypothesis while providing and supporting reasonable arguments in favor of the alternative we advance. The actual evidence for a role of chirping in communication is much less strong than appears from the pure number of articles that have discussed chirps in this context.

      Regarding the weak evidence against communication, here we can list a few additional important points related to the proposed interpretations of chirp function (more specific than those made earlier):

      (1) A formally sound assessment of signal value/meaning - as typically done in animal communication studies should involve: 

      a) the isolation of a naturally occurring signal and determination of the context in which it is produced 

      b) the artificial replication of the signal

      c) the observation that such mimic is capable of triggering reliable and stereotyped responses in a group of individuals (identified by sex and/or species) under the same conditions (conditioned, unconditioned, state-dependent, etc.). As discussed for instance in Bradbury and Vehrencamp, 2011; Laidre and Johnstone, 2013; Wyatt, 2015; Rutz et al., 2023.

      This approach has so far not been applied to weakly electric fish. The initial purpose of the present study was in fact to conduct this type of validation.

      (2) The hypothesis of chirps used for DF-sign discrimination - for “social purposes” - although plausible in the face of theoretical considerations,  does not seem to be reasonable in practice, when one considers emission rates of 150 chirps per minute. We do find a strong correlation of chirp type with DF, which is often very abrupt and sudden (as if the fish were tracking beat frequency to guess its value) but the consideration made above on chirp rates seems to discourage this interpretation.

      (3) The hypothesis of chirp-patterning (i.e. chirping may have meaning based on the sequence of chirps of different types, a bit like syllables in birdsongs) - assessed by only one study conducted in our group - has not been enough substantiated by replication. We have surveyed all possible combinations of chirps produced by interacting pairs in different behavioral conditions using different value for chirp sequence size: 2, 3,... ,8 chirps (both considering the sender alone as well as sender+receiver together). In all cases we found no evidence for  a context dependent “modulation” of chirp types (i.e. no specific chirp type sequence in specific contexts).

      (4) The hypothesized role of “large chirps” as courtship signals could be easily criticized by noting the symmetrical distribution of these events around  a DF of 0 Hz . Although one could argue about a failure to discriminate DF-sign, to explain this well known pattern. However, we know from Walter Heiligenberg’s work and physiological considerations that such task can be solved easily through t-units and … in principle even just by motion (which would change the EOD phase in frequency dependent ways, thus potentially revealing the DF sign).

      Overall, these considerations made us think that certainly chirping occurs in a social context, but it is the meaning of this behavior that remains elusive.  We noticed that environmental factors are also strongly implied … we then formulate an alternative hypothesis to explain chirping but we do so  without dismissing the communication idea.

      All this seems to us just a careful way to critically discuss our results and those of other studies, without considering the issue resolved.

      In the introduction, the authors state, "Since both chirps and positional parameters (such as size, orientation or motion) can only be detected as perturbations of the beat, and via the same electroreceptors, the inputs relaying both types of information are inevitably interfering." I disagree with this statement, which seems to be a key assumption. Both of these features certainly modulate the activity of electroreceptors, but that does not mean those modulations are ambiguous as to their source. You do not know whether the two types of modulations can be unambiguously decoded from electroreceptor afferent population activity.

      We thank the Reviewer for noting this imprecision. We have addressed the Reviewer’s concern in another reply (see above).

      My biggest issue with this manuscript is that it is much too strong in dismissing evidence that chirping correlates with context. In your behavioral observations, you found sex differences in chirping as well as differences between freely interacting and physically separated fish. Chirps tended to occur in close proximity to another fish. Your model of chirp variability found that environmental experience, social experience, and beat frequency (DF) are the most important factors explaining chirp variability. Are these not all considered behavioral or social context? Beat frequency (DF) in particular is heavily downplayed as being a part of "context" but it is a crucial part of the context, as it provides information about the identity of the fish you're interacting with. The authors show quite convincingly that the types of chirps produced do not vary with these contexts, but chirp rates do.

      We believe the “perceived claim” may be an issue of unclear writing. We have now tried to better clarify that “context” affects chirp rates, but it does not affect chirp types as much (except when beat frequency is high).  

      We have edited two statements possibly susceptible to misinterpretation: 

      (1) In the results: “It also indicates that chirp parameters such as duration and FM do not seem to be associated with any particular context in a meaningful way, other than being affected by beat frequency.”

      (2) In the discussion: the statement

      “Recordings from interacting fish pairs confirmed the absence of any significant correlation between chirp type choice and behavioral context (Figure S2) although the variance of chirp parameters appears to be significantly affected by this factor (Figure 2). This may suggest that the effect of behavioral context is mainly detectable in the number of chirps produced (Figure S1), rather than the type (Figure S2).”

      has been changed to:

      “Recordings from interacting fish pairs confirmed the absence of any significant correlation between chirp type choice and behavioral context, except for those cases characterized by higher beat frequencies  (Figure S2). This suggests that the effect of behavioral context highlighted in our factor analysis (Figure 2) is mainly due to the number of chirps produced (Figure S1), rather than their type (Figure S2).”

      Eventually, in the results we emphasize the relatively higher impact of previously unexplored factors on chirp variance: “The plot of individual chirps (Figure 2C) shows the presence of clustering around different categorical variables and it reveals that experience levels or swimming conditions are important factors affecting chirp distribution (note for instance the large central “breeding” cluster in which fish are divided and the smaller ones in which fish are free). Sender or receiver identity does not individuate any clear clustering relative to either sex (see the overlap of male_s/male_r and female_s/female_r) or social status (dominant/subordinate). Chirps labeled based on tank experience (i.e. resident vs intruder) are instead clearly separated.”.

      Further, in your playback experiments, fish responded differently to small vs. large DFs, males chirped more than females, type 2 chirps became more frequent throughout a playback, and rises tended to occur at the end of a playback. These are all examples of context-dependent behavior.

      We do note that male brown ghosts chirp more than females. But we do also say - and show in figure 8 - that males move more in proximity to and around conspecifics. We do acknowledge that chirp time-course may be different during playbacks in a type-dependent manner. But how this can support the communication hypothesis - or other alternatives - is unclear. This result could equally imply the use of different chirp types for different probing needs. Since we cannot be sure about either, we do not want to put too much emphasis to it. Eventually, the fact that “context” (here meant broadly to define different experimental situations in which social but also physical and environmental parameters are altered) affects chirping is undeniable: cluttered and non-cluttered environments do represent different contexts which differently affect chirping in conspicuous ways.

      In the results, the authors state, "Overall, the majority of chirps were produced by male subjects, in comparable amounts regardless of environmental experience (resident, intruder or equal; Figure S1A,C), social status (dominant or subordinate; Figure S1B) or social experience (novel or experienced; Figure S1D)." This is not what is shown in Figure S1. S1A shows clear differences between resident vs. intruder males, S1B shows clear differences between dominant vs. subordinate males, and S1D shows clear differences between naïve and experienced males. The analysis shown in Figure 2 would seem to support this. Indeed, the authors state, "Overall, this analysis indicated that environmental and social experience, together with beat frequency (DF) are the most important factors explaining chirp variability."

      The Reviewer is right in pointing at this imprecise reference and we are grateful for spotting this incongruence. The writing refers probably to an earlier version of the figure in which data were grouped and analyzed differently. We now edited the text and changed it to: “Overall, the majority of chirps were produced by male subjects, at rates that seemed  affected by environmental experience (resident, intruder or equal; Figure S1A,C), social status (dominant or subordinate; Figure S1B) and social experience (novel or experienced; Figure S1D).”

      The choice of chirp type varied widely between individuals but was relatively consistent within individuals across trials of the same experiment. The authors interpret this to mean that chirping does not vary with internal state, but is it not likely that the internal states of individuals are stable under stable conditions, and that individuals may differ in these internal states across the same conditions? Stable differences in communication signals between individuals are frequently interpreted as reflecting differences between those individuals in certain characteristics, which are being communicated by these signals.

      It seems here we have been unclear in the writing: while it is true that behavioral states are stable and can imply stable chirp patterning (if the two are related), since chirp types vary abruptly and in a reliable DF-dependent manner, different types of chirps are unlikely to be matched to different internal states following the same temporal order in such a reliable way (similarly repeated through consecutive trials).

      This would imply the occurrence of different internal states in rapid sequence, reliably triggered by repeated EOD ramps, regardless of whether the playback is 20 sec long or 180 sec long.

      We have edited this paragraph to better explain this: “The reliability by which the chirping response adapts to both the rate and direction of beat frequency is variable across individuals but rather stable across trials (relative to a given subject), further suggesting that chirp type variations may not reflect changes in internal states or in the animal motivation to specific behavioral displays (which are presumably subject to less abrupt variations and stereotypical patterning based on DF).”

      I am not convinced of the conclusion drawn by the analysis of chirp transitions. The transition matrices show plenty of 1-2 and 2-1 transitions occurring.

      The only groups in which 1-2 and 2-1 transitions are as frequent as 1-1 and 2-2 (being 1 and 2 the numerical IDs of the two interacting fish) are F-F pairs. This is a result of the fact that in females chirp rates are so low that within-fish-correlations end up being as low as between-fish-correlations. We believe the impression of the Reviewer could be due to the fact that these are normalized maps (see legend of Figure 5A-B).

      Further, the cross-correlation analysis only shows that chirp timing between individuals is not phase-locked at these small timescales. It is entirely possible that chirp rates are correlated between interacting individuals, even if their precise timing is not.

      We agree with the Reviewer, this is a possibility. To address this point, we did edit the results section to acknowledge that what we see may be related to the time window chosen (i.e. 4 sec):

      “More importantly, they show that - at least in the social conditions analyzed here and within small-sized time windows - chirp time series produced by different fish during paired interactions are consistently independent of each other.”

      Further, it is not clear to me how "transitions" were defined. The methods do not make this clear, and it is not clear to me how you can have zero chirp transitions between two individuals when those two individuals are both generating chirps throughout an interaction.

      We thank the Reviewer for bringing up this unclear point. We have now clarified how transitions were calculated in the method section: “The number of chirp transitions present in each recording (dataset used for Figures 1, 2, 5) was measured by searching in a string array containing the 4 chirp types per fish pair, all their possible pairwise permutations (i.e. all possible permutations of 4+4=8 elements are: 1-1, 1-2, 1-3 … 7-6, 7-7, 7-8; considering the following legend 1 = fish1 type 1, 2 = fish 1 type 2, 3 = fish1 type 3 … 6 = fish2 type 2, 7 = fish2 type 3 and 8 = fish2 rise).”.

      Zero transitions are possible if two fish (or groups of fish) do not produce chirps of all types. Only transitions of produced types can be counted.

      In the results, "Although all chirp types were used during aggressive interactions, these seemed to be rather less frequent in the immediate surround of the chirps (Figure 6A)." A lack of precise temporal correlation on short timescales does not mean there is no association between the two behaviors. An increased rate of chirping during aggression is still a correlation between the two behaviors, even if chirps and specific aggressive behaviors are not tightly time-locked.

      The Reviewer is right in pointing out the limited temporal scaling of our observations/analysis. We have now edited the last paragraph of the results related to figure 6 to include the possibility mentioned by the Reviewer: “The significantly higher extent of chirping during swimming and locomotion, consistently confirmed by 4 different approaches (PSTH, TM, CN, MDS), suggests that - although chirp-behavior correlations may exist at time-scales larger than those here considered - chirping may be linked more strongly with scanning and environmental exploration than with a particular motivational state, thus confirming findings from our playback experiments.”

      The Reviewer here remarks an important point, yet, due to space limitations, we have considered only a sub-second scale. Most playback experiments in weakly electric fish implied the use of EOD mimics for a few tens of seconds - to avoid habituation in the fish behavioral responses -  while inter-chirp intervals usually range between a few hundreds of milliseconds to seconds (depending on how often a fish would chirp). This suggested to us that a 4 second time window may not be a bad choice to start with.

      In summary, it is simply too strong to say that chirping does not correlate with context, or to claim that there is convincing evidence arguing against a communication function of chirps. Importantly, however, this does not detract from your exciting and well-supported hypothesis that chirping functions in homeoactive sensing. A given EOD behavior could serve both communication and homeoactive sensing. I actually suspect this is quite common in electric fish (both gymnotiforms and mormyrids), and perhaps in other actively sensing species such as echolocating animals. The two are not mutually exclusive.

      We agree with the Reviewer that context - broadly speaking - does affect chirping (as we mentioned above). We hope we have improved the writing and clarified that we do not dismiss communication functions of chirping, but we do lean towards electrolocation based on the considerations above made and our results.

      We do conclude the manuscript remarking that communication and electrolocation are not mutually exclusive: ”probing cues could function simultaneously as proximity signals to signal presence, deter approaches, or coordinate behaviors like spawning, if properly timed (Henninger et al., 2018).” (see the conclusion paragraph of the discussion) .

      Therein, we further add “These findings aim to stir the pot and initiate a discussion on possible alternative functions of chirps beyond their presumed communication role.”.

      With this, we hope we’ve made it clear how we intend our manuscript to be read.

      Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, without and with playback experiments. It applies state-of-the-art methods for reducing the dimensionality of the data and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that the traditionally assumed communication function of chirps may be secondary to its role in environmental assessment and exploration that takes social context into account. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats caused by other fish and as well as objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a primary communication goal for most chirps. Rather, the key determinants of chirping are the difference frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. The paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-receiver chirp transitions beyond the known increase in chirp frequency during an interaction.

      These conclusions by themselves will be very useful to the field. They will also allow scientists working on other "communication" systems to perhaps reconsider and expand the goals of the probes used in those senses. A lot of data are summarized in this paper, with thorough referencing to past work.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization, and in this sense are self-directed signals. This led to their prediction that environmental complexity ("clutter") should increase chirp rate, which is fact was revealed by their new experiments. The authors also argue that waveform EODs have less power across high spatial frequencies compared to pulse-type fish, with a resulting relatively impoverished power of resolution. Chirping in wave-type fish could temporarily compensate for the lower frequency resolution while still being able to resolve EOD perturbations with a good temporal definition (which pulse-type fish lack due to low pulse rates).

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water. The paper provides a number of experimental avenues to pursue in order to validate the non-communication role of chirps.

      We thank the reviewer for the kind assessment.

      Weaknesses:

      My main criticism is that the alternative putative role for chirps as probe signals that optimize beat detection could be better developed. The paper could be clearer as to what that means precisely, especially since beating - and therefore detection of some aspects of beating due to the proximity of a conspecific - most often precedes chirping. One meaning the authors suggest, tentatively, is that the chirps could enhance electrosensory responses to the beat, for example by causing beat phase shifts that remediate blind spots in the electric field of view.

      We agree with the Reviewer that a better and more detailed explanation of how beat processing for conspecific electrolocation may be positively affected by chirps would be important to provide. We are currently working on a follow-up manuscript in which we intend to include these aspects. For space limitations and readability we had to discard from the current manuscript a lot of results that could further clarify these issues.

      A second criticism is that the study links the beat detection to underwater object localization. The paper does not significantly develop that line of thought given their data - the authors tread carefully here given the speculative aspect of this link. It is certainly possible that the image on the fish's body of an object in the environment will be slightly modified by introducing a chirp on the waveform, as this may enhance certain heterogeneities of the object in relation to its environment. The thrust of this argument derives mainly from the notion of Fourier analysis with pulse type fish EOD waveforms (see above, and radar theory more generally), where higher temporal frequencies in the beat waveform induced by the chirp will enable a better spatial resolution of objects. It remains to be seen whether experiments can show this to be significant.

      Perhaps the Reviewer refers to the last discussion paragraph before the conclusions in which we mention the performance of pulse or wave-type EODs in electrolocation (referring here to ideas illustrated in a recent review by Crampton, 2019). We added to this paragraph a statement which could better clarify that we do not propose that chirping could enhance object electrolocation. What we mean is that, in a context in which object electrolocation occurs through wave-type EODs - given the generally lower performance of such narrow-band signals in resolving the spatial features of any object, even a 3D electric field  - chirping could improve beat detection during social encounters by increasing the amount of information obtained by the fish.

      The edited paragraph now reads: “While broadband pulse signals may be useful to capture highly complex environments rich in foliage, roots and other structures common in vegetation featuring the more superficial habitats in which pulse-type fish live, wave-type EODs may be a better choice in the relatively simpler river-bed environments in which many wave-type fish live (e.g., the benthic zone of deep river channels; Crampton, 2019). In this case, achieving a good spatial resolution is critical during social encounters, especially considering the limited utility of visual cues in these low-light conditions. In such habitats, social encounters may “electrically” be less “abrupt”, but spatially less “conspicuous” or blurred (as a 3D electric field may be). In such a scenario, chirps could serve as a means to supplement the spatial information acquired via the beat, accentuating these cues during periods of reduced resolution.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      None, my points in the original review have been properly addressed in this resubmission.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript presented a useful toolkit designed for CyTOF data analysis, which integrates 5 key steps as an analytical framework. A semi-supervised clustering tool was developed, and its performance was tested in multiple independent datasets. The tool was compared to human experts as well as supervised and unsupervised methods. 

      Strengths: 

      The study employed multiple independent datasets to test the pipeline. A new semi-supervised clustering method was developed. 

      Weaknesses: 

      The examination of the whole pipeline is incomplete. Lack of descriptions or justifications for some analyses. 

      We thank the reviewer’s overall summary and comments of this manuscript. In the last part of the results, we showcased the functionalities of ImmCellTyper in covid dataset, including quality check, BinaryClust clustering, cell abundance quantification, state marker expression comparison within each identified cell types, cell population extraction, subpopulation discovery using unsupervised methods, and data visualization etc. We added more descriptions in the text based on the reviewer’s suggestions. 

      Reviewer #2 (Public Review): 

      Summary: 

      The authors have developed marker selection and k-means (k=2) based binary clustering algorithm for the first-level supervised clustering of the CyTOF dataset. They built a seamless pipeline that offers the multiple functionalities required for CyTOF data analysis. 

      Strengths: 

      The strength of the study is the potential use of the pipeline for the CyTOF community as a wrapper for multiple functions required for the analysis. The concept of the first line of binary clustering with known markers can be practically powerful. 

      Weaknesses: 

      The weakness of the study is that there's little conceptual novelty in the algorithms suggested from the study and the benchmarking is done in limited conditions. 

      We thank the reviewer’s overall summary and comments of this manuscript. While the concept of binary clustering by k-means is not novel, BinaryClust only uses it for individual markers to identify positive and negative cells, then combine it with the pre-defined matrix for cell type identification. This has not been introduced elsewhere. Furthermore, ImmCellTyper streamlines the entire analysis process and enhances data exploration on multiple levels. For instance, users can evaluate functional marker expression level/cellular abundance across both main cell types and subpopulations; Also, this computational framework leverages the advantages of both semi-supervised and unsupervised clustering methods to facilitate subpopulation discovery. We believe these contributions warrant consideration as advancements in the field.  

      As for the benchmarking, we limited the depth only to main cell types rather than subpopulations. The reason is because we only apply BinaryClust to identify main cell types; For the cell subsets discovery, unsupervised methods integrated in this pipeline has already been published and widely used by the research community. Therefore, it does not seem to be necessary for additional benchmarking.

      Reviewer #3 (Public Review): 

      Summary: 

      ImmCellTyper is a new toolkit for Cytometry by time-of-flight data analysis. It includes BinaryClust, a semi-supervised clustering tool (which takes into account prior biological knowledge), designed for automated classification and annotation of specific cell types and subpopulations. ImmCellTyper also integrates a variety of tools to perform data quality analysis, batch effect correction, dimension reduction, unsupervised clustering, and differential analysis. 

      Strengths: 

      The proposed algorithm takes into account the prior knowledge. 

      The results on different benchmarks indicate competitive or better performance (in terms of accuracy and speed) depending on the method. 

      Weaknesses: 

      The proposed algorithm considers only CyTOF markers with binary distribution. 

      We thank the reviewer’s overall summary and comments of this manuscript. Binary classification can be considered as an imitation of human gating strategy, as it is applied to each marker. For example, when characterizing the CD8 T cells, we aim for CD19-CD14-CD3+CD4- population, which is binary in nature (either positive and negative) and follows the same logic as the method (BinaryClust) we developed. Results indicated that it works very well for well-defined main cell lineages, particularly when the expression of the defining marker is not continuous. However, the limitation is for subpopulation identification, because a handful makers behave in a continuum manner, so we suggest unsupervised method after BinaryClust, which also brings another advantage of identifying unknown subsets beyond our current knowledge, and none of the semi-supervised tools can achieve that. To address the reviewer’s concern, we considered the limitation of binary distribution, but it does not profoundly affect the application of the pipeline.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Many thanks for the reviewers’ comments and suggestions, please see below the point-to-point response:

      (1) The style of in-text reference citation is not consistent. Many do not have published years.

      The style of the reference citation has been revised and improved.  

      (2) The font size in the table of Figure 1 is too small, so is Figure 2. 

      The font size has been increased.

      (3) Is flowSOM used as part of BinaryClust? How should the variable running speed of BinaryClust be interpreted, given that it is occasionally slower and sometimes faster than flowSOM in the datasets?

      To answer reviewer’s question, flowSOM is not a part of BinaryClust. They are separate clustering methods that have been incorporated into the ImmCellTyper pipeline. As described in Figure 1, BinaryClust, a semi-supervised method, is used to classify the main cell lineages; while flowSOM, an unsupervised method, is recommended here for further subpopulation discovery. So, they operate independently of each other. To avoid confusions, we slightly modified Figure 1 for clarification.

      Regarding the variability in running speed in Figure 4. The performance of algorithms can indeed be influenced by the characteristics of the datasets, such as size and complexity. The differences observed between the covid dataset and the MPN dataset, such as marker panel, experimental protocol, and data acquisition process etc., could account for this variation. Our explanation is that flowSOM suits better the data structure of covid dataset, which might be the reason why it is slightly faster to analyse compared to the MPN dataset. Moreover, for the covid dataset, the runtime for both BinaryClust and flowSOM is less than 100s, and the difference is not notable. 

      (4) In the Method section ImmCellTyper workflow overview, it is difficult to link the description of the pipeline to Figure 8. There are two sub-pipelines in the text and seven steps in the figure. What are their relations? Some steps are not introduced in the text, such as Data transformation and SCE object construction. What is co-factor 5?

      Figure 8 provides an overview of the entire workflow for CyTOF data analysis, starting from the raw fcs file data and proceeding until downstream analysis (seven steps). But the actual implementation of the pipeline was divided into two separate sections, as outlined in the vignettes of the ImmCellTyper GitHub page (https://github.com/JingAnyaSun/ImmCellTyper/tree/main/vignettes).

      Users will initially run ‘Intro_to_batch_exam_correct’ to perform data quality check and identify potential batch effects, followed by ‘Intro_to_data_analysis’ for data exploration. We agree with the reviewer that the method for this section is a bit confusing, so we’ve added more description for clarification.

      In processing mass cytometry data, arcsine transformation is commonly applied to handle zero values, skewed distributions, and to improve visualization as well as clustering performance. The co-factor here is used as a parameter to scale down the data to control the width of the linear region before arcsine transformation. We usually get the best results by using co-factor 5 for CyTOF data.   

      (5) For differential analysis, could the pipeline analyze paired/repeated samples?

      For the statistical step, ImmCellTyper supports both two-study group comparison using Mann-Whitney Wilcoxon test, and multiple study group comparison (n>2) using Kruskal Wallis test followed by post hoc analysis (pairwise Wilcoxon test or Dunn’s test) with multiple testing correction using Benjamini-Hochberg Procedure.

      Certainly, this pipeline allows flexibilities, users can also extract the raw data of cell frequencies and apply suitable statistical methods for testing.

      (6) In Figure 2A, the range of the two axes is different for Dendritic cells, which could be misleading. Why the agreement is bad for dendritic cells?

      The range for the axes is automatically adapted to the data structure, which explains why they may not necessarily be equal. The co-efficient factor for the correlation of DCs is 0.958, compared to other cell types (> 0.99), it is relatively worse but does not indicate poor agreement.

      Moreover, the abundance of DCs is much less than other cell types, comprising approximately 2-5% of whole cells. As a result, even small differences in abundance may appear to as significant variations. For example, a difference of 1% in DC abundance represents a 2-fold change, which can be perceived as substantial.

      Overall, while the agreement for DCs may appear comparatively lower, it is not necessarily indicative of poor performance, considering both the coefficient factor and the relative abundance of DCs compared to other cell types.

      (7) In the Results section BinaryClust achieves high accuracy, what method was used to get the p-value, such as lines 212, 213, etc.?

      The accuracy of BinaryClust was tested using F-measure and ARI against ground truth (manual gating), the detailed description/calculation can be found in methods. For line 212 and 213, the p-value was calculated using ANOVA for the interaction plot shown in Figure 3. We’ve now added the statistical information into the figure legend.   

      (8) The performance comparison between BinaryClust and LDA is close. The current comparison design looks unfair. Given LDA only trained using half data, LDA may outperform BinaryClust.

      It is true that LDA was trained using half data, which is because this method requires manual gating results as training dataset to build a model, then apply the model to the rest of the files to label cell types. Here we used 50% of the whole dataset as training set. We are of course very happy to implement any additional suggestions for a better partition ratio.

      (9) There are 5 key steps in the proposed workflow. However, not every step was presented in the Results.

      Thanks for the comments. The results primarily focused on demonstrating the precision and performance of BinaryClust in comparison with ground truth and existing tools. Additionally, a case study showcasing the application/functions of the entire pipeline in a dataset was also presented. Due to limitation in space, the implementation details of the pipeline were described in the method section and github documentations, which users/readers can easily access.

      Reviewer #2 (Recommendations For The Authors): 

      The tools suggested by the authors could be potentially useful to the community. However, it's difficult to understand the conceptual novelty of the algorithms suggested here. The concept of binary clustering has been described before (https://doi.org/10.1186/s12859-022-05085-zhttps://doi.org/10.1152/ajplung.00104.2022), and it mainly utilizes k-means clustering set to generate binary clusters based on selected markers. Other algorithms associated with the package are taken from other studies. 

      We acknowledge the reviewer’s comment regarding the novelty of our method. While the concept of binary clustering by k-means has been previously described to transcriptome data, our approach applies it to CyTOF data analysis, which has not been introduced elsewhere. Furthermore, ImmCellTyper streamlines the entire analysis process and enhances data exploration on multiple levels. For instance, users can evaluate functional marker expression level/cellular abundance across both main cell types and subpopulations; Also, as stated in the manuscript, this computational framework leverages the advantages of both semi-supervised and unsupervised clustering methods to facilitate subpopulation discovery. We believe these contributions warrant consideration as advancements in the field.  

      In addition, the benchmarking of clustering performance, especially to reproduce manual gating and comparison to tools such as flowSOM is not comprehensive enough. The result for the benchmarking test could significantly vary depending on how the authors set the ground truth (resolution of cell type annotations). The authors should compare the tool's performance by changing the depth of cell type annotations. Especially, the low abundance cell types such as gdT cells or DCs were not effectively captured by the suggested methods. 

      Thanks for the comment. We appreciate the reviewer’s concern. However, as illustrated in figure 1, our approach uses BinaryClust, a semi-supervised method, to identify main cell types rather than directly targeting subpopulations. The reason is because semi-supervised method relies on users’ prior definition thus is limited to discover novel subsets. In the ImmCellTyper framework, unsupervised method was subsequently applied for subset exploration following the BinaryClust step.

      Regarding benchmarking, we focused on testing the precision of BinaryClust for main cell type characterization, because it is what the method is used for in the pipeline, and we believe this is sufficient. As for the cell subsets discovery, the unsupervised methods we integrated has already been published and widely used by the research community. Therefore, it does not seem to be necessary for additional benchmarking.

      Moreover, as shown in Figure 3 and Table 1, our results indicated that the F-measure for DCs and gdT cells in BinaryClust is 0.80 and 0.92 respectively, which were very close to ground truth and outperformed flowSOM, demonstrating its effectiveness. 

      We hope these clarifications address the reviewer’s concern.

      Minor comments: 

      (1) In Figure 4, it's perplexing to note that BinaryClust shows the slowest runtime for the COVID dataset, compared to the MPN dataset, which features a similar number of cells. What causes this variation? Is it dependent on the number of markers utilized for the clustering? This should be clarified/tested. 

      Thanks for the comment, but we are not sure that we fully understand the question. As shown in figure 4 that BinaryClust has slightly higher runtime in MPN dataset than covid dataset, which is reasonable because and the cell number in MPN dataset is around 1.6 million more than covid dataset.

      (2) Some typos are noted: 

      - DeepCyTOF and LDA use a maker expression matrix extracted → "marker"?* 

      Corrected.

      - Datasets(Chevrier et al.)which → spacing* 

      Corrected.

      - This is due to the method's reliance → spacing*

      Corrected.

      Reviewer #3 (Recommendations For The Authors): 

      Is it possible to accommodate more than two levels within the clustering process, i.e., can the proposed semi-supervised clustering tool be extended to multi-levels instead of binary?

      Thanks for the comments. Binary classification can be considered as an imitation of human gating strategy, as it is applied to each marker. For example, when characterizing the CD8 T cells, we aim for CD19-CD14-CD3+CD4- population, which is binary in nature (either positive and negative) and follows the same logic as the method (BinaryClust) we developed. Results indicated that it works very well for well-defined main cell lineages. However, the limitation is for subpopulation identification, because a handful of makers behave in a continuum manner, so we would suggest unsupervised method after BinaryClust, which also brings another advantage of identifying unknown subsets beyond our current knowledge, and none of the semi-supervised tools can achieve that. To answer the reviewer’s question, it is possible to set the number to 3,4,5 rather than just 2, but considering the design and rationale of the entire framework (as describe in the manuscript and above), it doesn’t seem to be necessary.

      Could you please comment on why on the COVID dataset, BinaryClust was slower as compared to flowSOM?

      Thanks for the question. The performance of algorithms can indeed be affected by the characteristics of the datasets, such as their size and complexity. The covid and MPN datasets differ in various aspects including marker panel, experimental protocol, and data acquisition process, among others, which wound account for the observed variation in speed. So, our explanation is flowSOM suits better for the structure of covid dataset than MPN dataset.  Additionally, for covid dataset, both BinaryClust and flowSOM have runtimes of less than 100s, and the difference between the two isn’t particularly dramatic.

      Minor errors: 

      Line#215 "(ref) " reference is missing

      Added.

      Figure 3, increase the font of the text in order to improve readability. 

      Increased.

      Line#229 didn't --> did not. 

      Corrected

      Line#293 repetition of the reference. 

      The repetition is due to the format of the citation, which has been revised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):  

      Summary: 

      In this study, Nandy and colleagues examine neural and behavioral correlates of perceptual variability in monkeys performing a visual change detection task. They used a laminar probe to record from area V4 while two macaque monkeys detected a small change in stimulus orientation that occurred at a random time in one of two locations, focusing their analysis on stimulus conditions where the animal was equally likely to detect (hit) or not-detect (miss) a briefly presented orientation change (target). They discovered two behavioral measures that are significantly different between hit and miss trials - pupil size tends to be slightly larger on hits vs. misses, and monkeys are more likely to miss the target on trials in which they made a microsaccade shortly before target onset. They also examined multiple measures of neural activity across the cortical layers and found some measures that are significantly different between hits and misses. 

      Strengths: 

      Overall the study is well executed and the analyses are appropriate (though multiple issues do need to be addressed). 

      We thank the reviewer for their enthusiasm and their constructive comments which we address below.

      Weaknesses: 

      My main concern with this study is that with the exception of the pre-target microsaccades, the physiological and behavioral correlates of perceptual variability (differences between hits and misses) appear to be very weak and disconnected. Some of these measures rely on complex analyses that are not hypothesis-driven and where statistical significance is difficult to assess. The more intuitive analysis of the predictive power of trial outcomes based on the behavioral and neural measures is only discussed at the end of the paper. This analysis shows that some of the significant measures have no predictive power, while others cannot be examined using the predictive power analysis because these measures cannot be estimated in single trials. Given these weak and disconnected effects, my overall sense is that the current results do not significantly advance our understanding of the neural basis of perceptual variability. 

      Reviewer #1 (Recommendations For The Authors): 

      (1) Most of the effects are very small. For example, the difference in pupil size between hits and misses is ~0.08 z-score units. The differences in firing rates between hits and misses are in the order of 1-2% of normalized firing rates. While these effects may be significant, their contribution to perceptual variability could be negligible, as suggested by the analysis of predictive power at the end of the result section. On a related note, it would be useful to mention the analysis of predictive power earlier in the paper. The finding that some of the measures do not have significant predictive power w/r to behavioral outcome raises questions regarding their importance. Finally, it would strengthen the paper if the authors could come up with methods to assess the predictive power of the PPC and interlaminar SSC. Without such analyses, it is difficult to assess the importance of these measures. 

      We expect that relatively small differences in early to intermediate sensory areas could cumulatively result in large differences in higher areas and contribute to the binary distinction between hits and misses. We certainly do not claim that these results completely explain state-dependent differences that determine the outcome of these trials. Instead, we have focused on neural signatures at the level of the V4 columnar microcircuit that might ultimately contribute to the variability in perception.

      We would like to emphasize that, based on the reviewer’s recommendation, we have now analyzed our results separately for each animal (see below). The consistency and significance of our findings across both animals give us confidence that what we have reported here are important neural signatures underlying perceptual variability at threshold.

      We would also like to note that SSC and PPC are now part of the standard toolkit of systems neuroscience and have been employed in numerous studies to our knowledge. While all measures come with their set of caveats and limitations, these two measures provide a frequency-resolved metric of the relationship between two temporal processes (point or continuous), which we believe provide insights into the interlaminar flow of information that we report here.

      Unfortunately, limitations in the GLM method and the reliability of these analyses with limited data make it impossible for these two measures to be included. The GLM requires all variables to be defined for each trial in the input. SSC and PPC can be undefined at low firing rates and require a substantial amount of data to be reliably calculated. While we did consider imputing data or estimating SSC and PPC using multiple trials, we ultimately did not pursue this idea as the purpose of the GLM was to use simultaneous measurements from single trials. 

      (2) What is the actual predictive power of the GLM model (i.e., what is the accuracy of predicting whether a given held-out trial will lead to a hit or a miss)? How much of this predictive power is accounted for by the effect of microsaccades? 

      As the GLM is not a decoder, it does not classify whether a given left out trial will be a hit or a miss. However, the GLM was highly predictive compared to a constant model. This information has been added to Table 3. The deviance of the GLM with and without microsaccades as a variable was not significantly different (p >0.9).  

      (3) The role of stimulus contrast is not explained clearly. Are all the analyses and figures restricted to a single contrast level? Was the contrast the same on both sides? If multiple contrasts are used, could contrast account for some of the observed neural-behavioral covariations? 

      All of the analyses include stimuli of all tested contrast levels. Stimulus contrasts were the same at both locations (attended and unattended). We have added a more detailed description of the contrast in hit and miss trials (Lines 289-296 and reproduced that here: 

      “Non-target stimulus contrasts were slightly different between hits and misses (mean:

      33.1% in hits, 34.0% in misses, permutation test, 𝑝 = 0.02), but the contrast of the target was higher in hits compared to misses (mean: 38.7% in hits, 27.7% in misses, permutation test, 𝑝 = 1.6 𝑒 − 31). Firing rates were normalized by contrast in Figure 3. In all other figures, we considered only non-target stimuli, which had very minor differences in contrast (<1%) across hits and misses. While we cannot completely rule out any other effects of stimulus contrast, the normalization in Figure 3 and minor differences for non-target stimuli should minimize them.”

      (4) Do the animals make false alarms (i.e., report seeing a target in non-target epochs)?

      If not, then it is not clear that the animals are performing near their perceptual threshold. If the false-alarm rate is non-zero, it should be reported and analyzed for neural/behavioral correlates. Does the logistic regression fit allow for a false alarm rate? More generally, it would be useful to see a summary of behavioral performance, such as distribution of thresholds, lower and upper asymptotes, and detection rates on foil trials vs. matched target trials. 

      The logistic regression does allow for a false alarm rate. We have reported additional behavioral parameters in Figure 1-figure supplement 3A-G.  

      (5) As far as I can tell, all the analyses in the paper are done on data combined across the two animals. Given that these effects are weak and that the analyses are complex, it is important to demonstrate for each analysis/figure that the results hold for each animal separately before combining the data across animals. This can be done in supplementary figures. 

      We have updated the paper to include all main results plotted separately for each animal as supplementary figures. 

      - Figure 2-figure supplement 2

      - Figure 3-figure supplement 1

      - Figure 3-figure supplement 2

      - Figure 4-figure supplement 1

      - Figure 5-figure supplement 2

      - Figure 7-figure supplement 1

      All the results except for the canonical correlation analysis were present, consistent, and significant when we analyzed them in each monkey independently.

      (6) The selection of the temporal interval used for the various analyses appears somewhat post hoc and is not explained clearly. Some analyses are restricted to the period immediately before or during target onset (e.g., 400 ms before target onset for analysis of the effect of microsaccade, 60 ms before stimulus onset for the analysis of the effect of neural variability). Other analyses are done on non-target rather than target stimuli. What is the justification for selecting these particular periods for these analyses? The differences in firing rates between hits and misses are restricted to the target epoch and are not present in the non-target epochs. Given these results, it seems important to compare the effects in target and non-target epochs in other analyses as well.

      Restricting the analysis of the Fano Factor to 60 ms before non-target onset seems odd. Given that the duration of the interval between stimulus presentations is random, how could this pre-stimulus effect be time-locked to target onset? 

      We selected a 200ms time window during the pre-stimulus or stimulus-evoked period for almost all our analyses. The results relating to microsaccade occurrence were robust to narrower time windows more consistent with the other pre-stimulus windows we used, but we chose to use the 400ms window to capture a larger fraction of trials with microsaccades. 

      Only the Fano factor time window was selected post-hoc based on the traces in Figure 4A, and the result is robust across animals (new Figure 4-figure supplement 1). The inter-stimulus intervals are random, and we do not believe the neural variability is timelocked to upcoming stimuli, but that lower variability in this pre-stimulus window is characteristic of hits. 

      We believe that the consistency of our results across both animals provides further evidence that our time window selection was appropriate. 

      We are interested in the extent to which these effects would remain consistent when applied only to target stimuli. However, restricting our analyses to only target stimuli substantially reduces the amount of neural data available for analysis. We plan to explore target stimulus representation more thoroughly in future studies.   

      (7) Can the measured neural response be used to discriminate between target and nontarget stimuli? If so, is the discriminability between target and non-target higher in hits vs. misses? 

      Thank you for raising this interesting point. We performed this analysis and find that target stimuli are more discriminable from non-targets in hits compared to misses. This has been added as a new Figure 3A.  

      (8) How many trials were performed per session? Did miss probability tend to increase over time over the session? If so, could this slow change in hit probability account for some of the observed neural and behavioral correlations with perceptual decisions? 

      Monkeys initiated a median of 905 trials (range of 651 to 1086). This has been added to the manuscript (Line 106). Approximately 1/8 of those trials were at perceptual threshold. Hit probability at threshold does not change substantially over the course of the session. We now report this in new Figure 1- figure Supplement 3I (error bars show standard deviation). 

      (9) Did miss probability depend on the time of the change within the trial? If so, do any of the behavioral/neural metrics share a similar within-trial time course? 

      Change times were not significantly different across hit and miss trials (p=0.15, Wilcoxon rank sum test). We now report this in new Figure 1-figure supplement 3H.

      (10) "Deep layer neurons exhibit reduced low-frequency phase-locking in hit trials than in misses (Figure 5B), suggesting an improvement in pooled signal-to-noise among this neural population." - why does this metric suggest improved SNR? Is there any evidence for improved SNR in the data? Why just in deep layers? 

      Thank you for raising this question. We agree this statement is not fully supported by the data and have removed it.  

      (11) I may have missed this but what were the sizes of the Gabor stimuli? 

      This has been added to the methods section (Line 454). The Gaussian halfwidth was 2 degrees.  

      Reviewer #2 (Public Review):  

      In this manuscript, the authors conducted a study in which they measured eye movements, pupil diameter, and neural activity in V4 in monkeys engaged in a visual attention task. The task required the monkeys to report changes in the orientation of Gabors' visual stimuli. The authors manipulated the difficulty of the trials by varying the degree of orientation change and focused their analysis on trials of intermediate difficulty where the monkeys' hit rate was approximately 50%. Their key findings include the following: 1) Hit trials were preceded by larger pupil diameter, reflecting higher arousal, and by more stable eye positions; 2) V4 neurons exhibit larger visual responses in hit trials; 3) Superficial and deep layers exhibited greater coherence in hit trials during both the pre-target stimulus period and the non-target stimulus presentation period. These findings have useful implications for the field, and the experiments and analyses presented in this manuscript validly support the authors' claims. 

      Strengths: 

      The experiments were well-designed and executed with meticulous control. The analyses of both behavioural and electrophysiological data align with the standards in the field. 

      We thank the reviewer for their enthusiasm about our study and their constructive comments which we address below.

      Weaknesses: 

      Many of the findings appear to be incremental compared to previous literature, including the authors' own work. While incremental findings are not necessarily a problem, the manuscript lacks clear statements about the extent to which the dataset, analysis, and findings overlap with the authors' prior research. For example, one of the main findings, which suggests that V4 neurons exhibit larger visual responses in hit trials (as shown in Fig. 3), appears to have been previously reported in their 2017 paper. Additionally, it seems that the entire Fig1-S1 may have been reused from the 2017 paper. These overlaps should have been explicitly acknowledged and correctly referenced. 

      While the raw data used in this paper overlaps entirely with Nandy et al. (2017), all the analyses and findings in this manuscript are new and have not been previously reported. Figure 1-figure supplement 1 is modified and reproduced from that paper only to allow readers to understand the recording methods used to collect the data without needing to go back to the previous paper. We have added an explicit acknowledgment of this to the figure caption.

      Previous studies have demonstrated that attention leads to decorrelation in V4 population activity. The authors should have discussed how and why the high coherence across layers observed in the current study can coexist with this decorrelation. 

      We have updated the discussion section (Lines 347-351) to further elaborate on this interpretation. 

      Furthermore, the manuscript does not explore potentially interesting aspects of the dataset. For instance, the authors could have investigated instances where monkeys made 'false' reports, such as executing saccades towards visual stimuli when no orientation change occurred. It would be valuable to provide the fraction of the monkeys' responses in a session, including false reports and correct rejections in catch trials, to allow for a broader analysis that considers the perceptual component of neural activity over pure sensory responses. 

      We appreciate this feedback. While we agree these are interesting directions, we decided to limit the scope of this study to only focus on trials at threshold with an orientation change, and are considering these directions for future studies. 

      Reviewer #2 (Recommendations For The Authors): 

      • Figure Design: Since eLife does not impose space limitations, it is advisable for the authors to avoid using very small font sizes. Consistency in font size throughout the figures is recommended. Some figures are challenging to discern, for example, the mean+-sem in Fig. 2B, and the alpha values of green and purple colours for superficial/deep layers are too high, making them too transparent or pale. 

      We have increased the size of some small fonts and improved font size consistency throughout the figures. We have changed the layer colors to improve legibility. 

      • Line 119: trail, 

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The present study provides a phylogenetic analysis of the size prefrontal areas in primates, aiming to investigate whether relative size of the rostral prefrontal cortex (frontal pole) and dorsolateral prefrontal cortex volume vary according to known ecological or social variables.

      I am very much in favor of the general approach taken in this study. Neuroimaging now allows us to obtain more detailed anatomical data in a much larger range of species than ever before and this study shows the questions that can be asked using these types of data. In general, the study is conducted with care, focusing on anatomical precision in definition of the cortical areas and using appropriate statistical techniques, such as PGLS.

      I have read the revised version of the manuscript with interest. I agree with the authors that a focus on ecological vs laboratory variables is a good one, although it might have been useful to reflect that in the title.

      I am happy to see that the authors included additional analyses using different definitions of FP and DLPFC in the supplementary material. As I said in my earlier review, the precise delineation of the areas will always be an issue of debate in studies like this, so showing the effects of different decisions in vital.

      We thank the reviewer for these positive remarks and for these very useful suggestions on the previous version of this article.

      I am sorry the authors are so dismissive of the idea of looking the models where brain size and area size are directly compared in the model, rather preferring to run separate models on brain size and area size. This seems to me a sensible suggestion.

      We agree with the reviewer 1 and the response of reviewer 3 also made it clear to us of why it was an important issue. We have therefore addressed it more thoroughly this time.

      First, we have added a new analysis, with whole brain volume included as covariate in the model accounting for regional volumes, together with the socio-ecological variables of interest. As expected given the very strong correlation across all brain measures (>90%), the effects of all socio-ecological factors disappear for both FP and DLPFC volumes when ‘whole brain’ is included as covariate. This is coherent with our previous analysis showing that the same combination of socio-ecological variables could account for the volume of FP, DLPFC and the whole brain. Nevertheless, the interpretation of these results remains difficult, because of the hidden assumptions underlying the analysis (see below).

      Second, we have clarified the theoretical reasons that made us choose absolute vs relative measures of brain volumes. In short, we understand the notion of specificity associated with relative measures, but 1) the interpretation of relative measures is confusing and 2) we have alternative ways to evaluate the specificity of the effects (which are complementary to the idea of adding whole brain volume as covariate). 

      Our goal here was to evaluate the influence of socio-ecological factors on specific brain regions, based on their known cognitive functions in laboratory conditions (working memory for the DLPFC and metacognition for the frontal pole). Thus, the null hypothesis is that socio-ecological challenges supposed to mobilize working memory and metacognition do not affect the size of the brain regions associated with these functions (respectively DLPFC and FP). This is what our analysis is testing, and from that perspective, it seems to us that direct measures are better, because within regions (across species), volumes provide a good index of neural counts (since densities are conserved), which are indicative fo the amount of computational resources available for the region. It is not the case when using relative measures, or when using the whole brain as covariate, since densities are heterogenous across brain regions (e.g. Herculano-Houzel, 2011; 2017, but see below for further details on this).

      Quantitatively, the theoretical level of specificity of the relation between brain regions and socio-ecological factors is difficult to evaluate, given that our predictions are based on the cognitive functions associated with DLPFC and FP, namely working memory and metacognition, and that each of these cognitive functions also involved other brain regions. We would actually predict that other brain regions associated with the same cognitive functions as DLPFC or FP also show a positive influence of the same socioecological variables. Given that the functional mapping of cognitive functions in the brain remains debated, it is extremely difficult to evaluate quantitatively how specific the influence of the socio-ecological factors should be on DLPFC and FP compared to the rest of the brain, in the frame of our hypothesis.

      Critically, given that FP and DLPFC show a differential sensitivity to population density, a proxy for social complexity, and that this difference is in line with laboratory studies showing a stronger implication of the FP in social cognition, we believe that there is indeed some specificity in the relation between specific regions of the PFC and socioecological variables. Thus, our results as a whole seem to indicate that the relation between prefrontal cortex regions and socio-ecological variables shows a small but significant level of specificity. We hope that the addition of the new analysis and the corresponding modifications of the introduction and discussion section will clarify this point.

      Similarly, the debate about whether area volume and number of neurons can be equated across the regions is an important one, of which they are a bit dismissive.

      We are sorry that the reviewer found us a bit dismissive on this issue, and there may have been a misunderstanding.

      Based on the literature, it is clearly established that for a given brain region, area volume provides a good proxy for the number of neurons, and it is legitimate to generalize this relation across species if neuronal densities are conserved for the region of interest (see for example Herculano-Houzel 2011, 2017 for review). It seems to be the case across primates because cytoarchitectonic maps are conserved for FP and DLPFC, at least in humans and laboratory primates (Petrides et al, 2012; Sallet et al, 2013; Gabi et al, 2016; Amiez et al, 2019). But we make no claim about the difference in number of neurons between FP and DLPFC, and we never compared regional volumes across regions (we only compared the influence of socio-ecological factors on each regional volume), so their difference in cellular density is not relevant here. As long as the neuronal density is conserved across species but within a region (DLPFC or FP), the difference in volume for that region, across species, does provide a reliable proxy for the influence of the socioecological regressor of interest (across species) on the number of neurons in that region.

      Our claims are based on the strength of the relation between 1) cross-species variability in a set of socio-ecological variables and 2) cross-species variability in neural counts in each region of interest (FP or DLPFC). Since the effects of interest relate to inter-specific differences, within a region, our only assumption is that the neural densities are conserved across distinct species for a given brain region. Again (see previous paragraph), there is reasonable evidence for that in the literature. Given that assumption, regional volumes (across species, for a given brain region) provide a good proxy for the number of neurons. Thus, the influence of a given socio-ecological variable on the interspecific differences in the volume of a single brain region provides a reliable estimate of the influence of that socio-ecological variable on the number of neurons in that region (across species), and potentially of the importance of the cognitive function associated with that region in laboratory conditions. None of our conclusions are based on direct comparison of volumes across regions, and we only compared the influence of socioecological factors (beta weights, after normalization of the variables).

      Note that this is yet another reason for not using relative measures and not including whole brain as covariate in the regression model: Given that whole brain and any specific region have a clear difference in density, and that this difference is probably not conserved across species, relative measures (or covariate analysis) cannot be used as proxies for neuronal counts (e.g. Herculano-Houzel, 2011). In other words, using the whole brain to rescale individual brain regions relies upon the assumption that the ratios of volumes (specific region/whole brain) are equivalent to the ratios of neural counts, which is not valid given the differences in densities.

      Nevertheless, I think this is an important study. I am happy that we are using imaging data to answer more wider phylogenetic questions. Combining detailed anatomy, big data, and phylogenetic statistical frameworks is a important approach.

      We really thank the reviewer for these positive remarks, and we hope that this study will indeed stimulate others using a similar approach.

      Reviewer #2 (Public Review):

      In the manuscript entitled "Linking the evolution of two prefrontal brain regions to social and foraging challenges in primates" the authors measure the volume of the frontal pole (FP, related to metacognition) and the dorsolateral prefrontal cortex (DLPFC, related to working memory) in 16 primate species to evaluate the influence of socio-ecological factors on the size of these cortical regions. The authors select 11 socio-ecological variables and use a phylogenetic generalized least squares (PGLS) approach to evaluate the joint influence of these socio-ecological variables on the neuro-anatomical variability of FP and DLPFC across the 16 selected primate species; in this way, the authors take into account the phylogenetic relations across primate species in their attempt to discover the the influence of socio-ecological variables on FP and DLPF evolution.

      The authors run their studies on brains collected from 1920 to 1970 and preserved in formalin solution. Also, they obtained data from the Mussée National d´Histoire Naturelle in Paris and from the Allen Brain Institute in California. The main findings consist in showing that the volume of the FP, the DLPFC, and the Rest of the Brain (ROB) across the 16 selected primate species is related to three socio-ecological variables: body mass, daily traveled distance, and population density. The authors conclude that metacognition and working memory are critical for foraging in primates and that FP volume is more sensitive to social constraints than DLPFC volume.

      The topic addressed in the present manuscript is relevant for understanding human brain evolution from the point of view of primate research, which, unfortunately, is a shrinking field in neuroscience. But the experimental design has two major weak points: the absence of lissencephalic primates among the selected species and the delimitation of FP and DLPFC. Also, a general theoretical and experimental frame linking evolution (phylogeny) and development (ontogeny) is lacking.

      We are sorry that the reviewer still believes that these two points are major weaknesses.

      - We have added a point on lissencephalic species in the discussion. In short, we acknowledge that our work may not be applied to lissencephalic species because they cannot be studied with our method, but on the other hand, based on laboratory data there is no evidence showing that the functional organization of the DLPFC and FP in lissencephalic primates is radically different from that of other primates (Dias et al, 1996; Roberts et al, 2007; Dureux et al, 2023; Wong et al, 2023). Therefore, there is no a priori reason to believe that not including lissencephalic primates prevents us from drawing conclusions that are valid for primates in general. Moreover, as explained in the discussion, including lissencephalic primates would require using invasive functional studies, only possible in laboratory conditions, which would not be compatible with the number of species (>15) necessary for phylogenetic studies (in particular PGLS approaches). Finally, as pointed out by the reviewer, our study is also relevant for understanding human brain evolution, and as such, including lissencephalic species should not be critical to this understanding.

      - In response to the remarks of reviewer 1 on the first version of the manuscript, we had included a new analysis in the previous version of the manuscript, to evaluate the validity of our functional maps given another set of boundaries between FP and DLPFC. But one should keep in mind that our objective here is not to provide a definitive definition of what the regions usually referred to as DLPFC and FP should be from an anatomical point of view. Rather, as our study aims at taking into account the phylogenetic relations across primate species, we chose landmarks that enable a comparison of the volume of cortex involved in metacognition (FP) and working memory (DLPFC) across species. We have also updated the discussion accordingly.

      We agree that this is a difficult point and we have always acknowledged that this was a clear limitation in our study. In the light of the functional imaging literature in humans and non-human primates, as well as the neurophysiological data in macaques, defining the functional boundary between FP and DLPFC remains a challenging issue even in very well controlled laboratory conditions. As mentioned by reviewer 1, “the precise delineation of the areas will always be an issue of debate in studies like this, so showing the effects of different decisions in vital”. Again, an additional analyses using different boundaries for FP and DLPFC was included in the supplementary material to address that issue. Now, we are not aware of solid evidence showing that the boundaries that we chose for DLPFC vs FP were wrong, and we believe that the comparison between 2 sets of measures as well as the discussion on this topic should be sufficient for the reader to assess both the strength and the limits of our conclusion. That being said, if the reviewer has any reference in mind showing better ways to delineate the functional boundary between FP and DLPFC in primates, we would be happy to include it in our manuscript.

      - The question of development, which is an important question per se,  is neither part of the hypothesis nor central for the field of comparative cognition in primates. Indeed, major studies in the field do not mention development (e.g. Byrne, 2000; Kaas, 2012; Barton, 2012). De Casien et al (2022) even showed that developmental constraints are largely irrelevant (see Claim 4 of their article): [« The functional constraints hypothesis […] predicts more complex, ‘mosaic’ patterns of change at the network level, since brain structure should evolve adaptively and in response to changing environments. It also suggests that ‘concerted’ patterns of brain evolution do not represent conclusive evidence for developmental constraints, since allometric relationships between developmentally linked or unlinked brain areas may result from selection to maintain functional connectivity. This is supported by recent computational modeling work [81], which also suggests that the value of mosaic or concerted patterns may fluctuate through time in a variable environment and that developmental coupling may not be a strong evolutionary constraint. Hence, the concept of concerted evolution can be decoupled from that of developmental constraints »].

      Finally, when studies on brain evolution and cognition mention development, it is generally to discuss energetic constraints rather than developmental mechanisms per se (Heldstab et al 2022 ; Smaers et al, 2021;  Preuss & Wise, 2021; Dunbar & Schutz, 2017; MacLean et al, 2012. Mars et al, 2018; 2021). Therefore, development does not seem to be a critical issue, neither for our article nor for the field.

      Reviewer #3 (Public Review):

      This is an interesting manuscript that addresses a longstanding debate in evolutionary biology - whether social or ecological factors are primarily responsible for the evolution of the large human brain. To address this, the authors examine the relationship between the size of two prefrontal regions involved in metacognition and working memory (DLPFC and FP) and socioecological variables across 16 primate species. I recommend major revisions to this manuscript due to: 1) a lack of clarity surrounding model construction; and 2) an inappropriate treatment of the relative importance of different predictors (due to a lack of scaling/normalization of predictor variables prior to analysis).

      We thank the reviewer for his/her remarks, and for the clarification of his /her criticism regarding the use of relative measures. We are sorry to have missed the importance of this point in the first place. We also thank the reviewer for the cited references, which were very interesting and which we have included in the discussion. As the reviewer 1 also shared these concerns, we wrote a detailed response to explain how we addressed the issue above.

      First, we did run a supplementary analysis where whole brain volume was added as covariate, together with socio-ecological variables, to account for the volume of FP or DLPFC. As expected given the very high correlation across all 3 brain measures, none of the socio-ecological variables remained significant. We have added a long paragraph in the discussion to tackle that issue. In short, we agree with the reviewer that the specificity of the effects (on a given brain region vs the rest of the brain) is a critical issue, and we acknowledge that since this is a standard in the field, it was necessary to address the issue and run this extra-analysis. But we also believe that specificity could be assessed by other means: given the differential influence of ‘population density’ on FP and DLPFC, in line with laboratory data, we believe that some of the effects that we describe do show specificity. Also, we prefer absolute measures to relative measures because they provide a better estimate of the corresponding cognitive operation, because standard allometric rules (i.e., body size or whole brain scaling) may not apply to the scaling and evolution of FP and DLPFC in primates.. Indeed, given that we use these measures as proxies of functions (metacognition for FP and working memory for DLPFC), it is clear that other parts of the brain should show the same effect since these functions are supported by entire networks that include not only our regions of interest but also other cortical areas in the parietal lobe. Thus, the extent to which the relation with socio-ecological variables should be stronger in regions of interest vs the whole brain depends upon the extent to which other regions are involved in the same cognitive function as our regions of interest, and this is clearly beyond the scope of this study. More importantly, volumetric measures are taken as proxies for the number of neurons, but this is only valid when comparing data from the same brain region (across species), but not across brain regions, since neural densities are not conserved. Thus, using relative measures (scaling with the whole brain volume) would only work if densities were conserved across brain regions, but it is not the case. From that perspective, the interpretation of absolute measures seems more straightforward, and we hope that the specificity of the effects could be evaluated using the comparison between the 3 measures (FP, DLPFC and whole brain) as well as the analysis suggested by the reviewer. We hope that the additional analysis and the updated discussion will be sufficient to cover that question, and that the reader will have all the information necessary to evaluate the level of specificity and the extent to which our findings can be interpreted.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In my previous review of the present manuscript, I pointed out the fact that defining parts, modules, or regions of the primate cerebral cortex based on macroscopic landmarks across primate species is problematic because it prevents comparisons between gyrencephalic and lissencephalic primate species. The authors have rephrased several paragraphs in their manuscript to acknowledge that their findings do apply to gyrencephalic primates.

      I also said that "Contemporary developmental biology has showed that the selection of morphological brain features happens within severe developmental constrains. Thus, the authors need a hypothesis linking the evolutionary expansion of FP and DLPFC during development. Otherwise, the claims form the mosaic brain and modularity lack fundamental support". I insisted that the author should clarify their concept of homology of cerebral cortex parts, modules, or regions cross species (in the present manuscript, the frontal pole and the dorsolateral prefrontal cortex). Those are not trivial questions because any phylogenetic explanation of brain region expansion in contemporary phylogenetic and evolutionary biology must be rooted in evolutionary developmental biology. In this regard, the authors could have discussed their findings in the frame of contemporary studies of cerebral cortex evolution and development, but, instead, they have rejected my criticism just saying that they are "not relevant here" or "clearly beyond the scope of this paper".

      The question of development, which is an important question per se, is neither part of the hypothesis nor central for the field of comparative cognition in primates. Indeed, the major studies in the field do not mention development and some even showed that developmental constraints were not relevant (see De Casien et al., 2022 and details in our response to the public review). When studies on brain evolution and cognition mention development, it is generally to discuss energetic constraints rather than developmental mechanisms per se (Heldstab et al 2022 ; Smaers et al, 2021;  Preuss & Wise, 2021; Dunbar & Schutz, 2017;  MacLean et al, 2012. Mars et al, 2018; 2021).

      If the other reviewers agree, the authors are free to publish in eLife their correlations in a vacuum of evolutionary developmental biology interpretation. I just disagree. Explanations of neural circuit evolution in primates and other mammalian species should tend to standards like the review in this link: https://royalsocietypublishing.org/doi/full/10.1098/ rstb.2020.0522

      In this article, Paul Cizek (a brilliant neurophysiologist) speculates on potential evolutionary mechanisms for some primate brain functions, but there is surprisingly very little reference to the existing literature on primate evolution and cognition. There is virtually no mention of studies that involve a large enough number of species to address evolutionary processes and/or a comparison with fossils and/or an evaluation of specific socio-ecological evolutionary constraints. Most of the cited literature refers to laboratory studies on brain anatomy of a handful of species, and their relevance for evolution remains to be evaluated. These ideas are very interesting and they could definitely provide an original perspective on evolution, but they are mostly based on speculations from laboratory studies, rather than from extensive comparative studies. This paper is interesting for understanding developmental mechanisms and their constraints on neurophysiological processes in laboratory conditions, but we do not think that it would fit it in the framework of our paper as it goes far beyond our main topic.

      Reviewer #3 (Recommendations For The Authors):

      Yes, I am suggesting that the authors also include analyses with brain size (rather than body size) as a covariate to evaluate the effects of other variables in the model over and above the effect on brain size. In a very simplified theoretical scenario: two species have the same body sizes, but species A has a larger brain and therefore a larger FP. In this case, species A has a larger FP because of brain allometric patterns, and models including body size as a covariate would link FP size and socioecological variables characteristic of species A (and others like it). However, perhaps the FP of species A is actually smaller than expected for its brain size, while the FP of species B is larger than expected for its brain size.

      As explained in our response to the public review, we did run this analysis and we agree with the reviewer’s point from a practical point of view: it is important to know the extent to which the relation with a set of socio-ecological variables is specific of the region of interest, vs less specific and present for other brain regions. Again, we are sorry to not have understood that earlier, and we acknowledge that since it is a standard in the field, it needs to be addressed thoroughly.

      We understand that the scaling intuition, and the need to get a reference point for volumetric measures, but here the volume of each brain region is taken as a proxy for the number of neurons and therefore for the region’s computational capacities. Since, for a given brain region (FP or DLPFC) the neural densities seem to be well conserved across species, comparing regional volumes across species provides a good proxy for the contrast (across species) in neural counts for that region. All we predicted was that for a given brain region, associated with a given cognitive operation, the volume (number of neurons) would be greater in species for which socio-ecological constraints potentially involving that specific cognitive operation were greater. We do not understand how or why the rest of the brain would change this interpretation (of course, as discussed just above, beyond the question of specificity). And using whole brain volume as a scaling measure is problematic because the whole brain density is very different from the density of these regions of the prefrontal cortex (see above for further details). Again, we acknowledge that allometric patterns exist, and we understand how they can be interpreted, but we do not understand how it could prove or disprove our hypothesis (brain regions involved in specific cognitive operations are influenced by a specific set of socio-ecological variables). When using volumes as a proxy for computational capacities, the theoretical implications of scaling  procedures might be problematic. For example, it implies that the computational capacities of a given brain region are scaled by the rest of the brain. All other things being equal, the computational capacities of a given brain region, taken as the number of neurons, should decrease when the size of the rest of the brain increases. But to our knowledge there is no evidence for that in the literature. Clearly these are very challenging issues, and our position was to take absolute measures because they do not rely upon hidden assumptions regarding allometric relations and their consequence on cognition.

      But since we definitely understand that scaling is a reference in the field, we have not only completed the corresponding analysis (including the whole brain as a covariate, together with socio-ecological variables) but also expended the discussion to address this issue in detail. We hope that between this new analysis and the comparison of effects between non-scaled measures of FP, DLPFC and the whole brain, the reader will be able to judge the specificity of the effect.

      Models including brain (instead of body) size would instead link FP size and socioecological variables characteristic of species B (and others like it). This approach is supported by a large body of literature linking comparative variation in the relative size of specific brain regions (i.e., relative to brain size) to behavioral variation across species - e.g., relative size of visual/olfactory brain areas and diurnality/nocturnality in primates (Barton et al. 1995), relative size of the hippocampus and food caching in birds (Krebs et al. 1989).

      Barton, R., Purvis, A., & Harvey, P. H. (1995). Evolutionary radiation of visual and olfactory brain systems in primates, bats and insectivores. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 348(1326), 381-392.

      Krebs, J. R., Sherry, D. F., Healy, S. D., Perry, V. H., & Vaccarino, A. L. (1989). Hippocampal specialization of food-storing birds. Proceedings of the National Academy of Sciences, 86(4), 1388-1392. 

      We are grateful to the reviewer for mentioning these very interesting articles, and more generally for helping us to understand this issue and clarify the related discussion. Again, we understand the scaling principle but the fact that these methods provide interesting results does not make other approaches (such as ours) wrong or irrelevant. Since we have used both our original approach and the standard version as requested by the reviewer, the reader should be able to get a clear picture of the measures and of their theoretical implications. We sincerely hope that the present version of the paper will be satisfactory, not only because it is clearer, but also because it might stimulate further discussion on this complex question.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents useful findings on several phage from deep sea isolates of Lentisphaerae strains WC36 and zth2 that further our understanding of deep sea microbial life. The manuscript's primary claim is that phage isolates augment polysaccharide use in Pseudomonas bacteria via auxiliary metabolic genes (AMGs). However, the strength of the evidence is incomplete and does not support the primary claims. Namely, there are not data presented to rule out phage contamination in the polysaccharide stock solution, AMGs are potentially misidentified, and there is missing evidence of successful infection.

      Thanks for the Editor’s and Reviewers’ positive and constructive comments, which help us improve the quality of our manuscript entitled “Deep-sea bacteriophages facilitate host utilization of polysaccharides” (paper#eLife-RP-RA-2023-92345). The comments are valuable, and we have studied the comments carefully and have made corresponding revisions according to the suggestions. We removed some uncertain results and strengthened other parts of the manuscript, which evidently improved the accuracy and impact of the revised version. Revised portions are marked in blue in the modified manuscript. Please find the detailed responses as following.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: This manuscript describes the identification and isolation of several phage from deep sea isolates of Lentisphaerae strains WC36 and zth2. The authors observe induction of several putative chronic phages with the introduction of additional polysaccharides to the media. The authors suggest that two of the recovered phage genomes encode AMGs associated with polysaccharide use. The authors also suggest that adding the purified phage to cultures of Pseudomonas stutzeri 273 increased the growth of this bacterium due to augmented polysaccharide use genes from the phage. While the findings were of interest and relevance to the field, it is my opinion that several of the analysis fall short of supporting the key assertions presented.

      Thanks for your comments. We removed some uncertain results and strengthened other parts of the manuscript, which evidently improved the accuracy and impact of the revised version. Please find the detailed responses as following.

      Strengths: Interesting isolate of deep sea Lentisphaerae strains which will undoubtedly further our understanding of deep sea microbial life.

      Thanks for your positive comments.  

      Weaknesses:

      (1) Many of the findings are consistent with a phage contamination in the polysaccharide stock solution. 

      Thanks for your comments. We are very sure that the phages are specifically derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. The reasons are as following: (1) the polysaccharide stock solution was strictly sterilized to remove any phage contamination; (2) we have performed multiple TEM checks of the rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), and there were not any phage-like structures, which confirmed that the polysaccharides (laminarin/starch) we used were not contaminated with any phage-like structures; in addition, we also observed the polysaccharides (laminarin/starch) directly by TEM and did not find any phage-like structures (Supplementary Fig. 2); (3) the polysaccharide (starch) alone could not promote the growth of Pseudomonas stutzeri 273, however, the supplement of starch together with the extracted Phages-WC36 could effectively facilitate the growth of Pseudomonas stutzeri 273 (Author response image 1). The above results clearly indicated the phages were derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. 

      Author response image 1.

      Growth curve and status of Pseudomonas stutzeri 273 cultivated in basal medium, basal medium supplemented with 20 μl/mL Phages-WC36, basal medium supplemented with 5 g/L starch, basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36. 

       

      (2) The genes presented as AMGs are largely well known and studied phage genes which play a role in infection cycles.

      Thanks for your comments. Indeed, these AMGs may be only common in virulent phages, while have never been reported in chronic phages. In virulent phages, these genes typically act as lysozymes, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection. However, the chronic phages do not lyse the host. Therefore, the persistence of these genes in chronic phages may be due to their ability to assist the host in metabolizing polysaccharides. Finally, according to your suggestions, we have weakened the role of AMGs and added “potential” in front of it. The detailed information is shown below.

      (3) The evidence that the isolated phage can infect Pseudomonas stutzeri 273 is lacking, putting into question the dependent results.

      Thanks for your comments. Actually, we selected many marine strains (Pseudomonadota, Planctomycetes, Verrucomicrobia, Fusobacteria, and Tenericutes isolates) to investigate whether Phages-WC36 could assist them in degradation and utilization of polysaccharides, and found that Phages-WC36 could only promote the growth of strain 273. It is reported that filamentous phages could recognize and bind to the host pili, which causes the pili to shrink and brings the filamentous phages closer to and possibly through the outer membrane of host cells. The possible mechanism of other chronic phages release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. Thus, these chronic phages may have a wider host range. However, we were unable to further reveal the infection mechanism due to some techniques absence. Therefore, according to your suggestions, we have deleted this section in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      I have previously reviewed this manuscript as a submission to another journal in 2022. My recommendations here mirror those of my prior suggestions, now with further added details.

      Thanks for your great efforts for reviewing our manuscript and valuable suggestions for last and this versions.

      Specific comments:

      Comment 1: Line 32. Rephrase to "polysaccharides cause the induction of multiple temperate phages infecting two strains of Lentisphaerae (WC36 and zth2) from the deep sea."

      Thanks for your positive suggestion. We have modified this description as “Here, we found for the first time that polysaccharides induced the production of multiple temperate phages infecting two deep-sea Lentisphaerae strains (WC36 and zth2).” in the revised manuscript (Lines 31-33). 

      Comment 2: Line 66. "Chronic" infections are not "lysogenic" as described here, suggesting the former is a subcategory of the latter. If you are going to introduce lifecycles you need a brief sentence distinguishing "chronic" from "lysogenic"

      Thanks for your positive suggestion. We added this sentence as “Currently, more and more attention has been paid to chronic life cycles where bacterial growth continues despite phage reproduction (Hoffmann Berling and Maze, 1964), which was different from the lysogenic life cycle that could possibly lyse the host under some specific conditions.” in the revised manuscript (Lines 66-69).

      Comment 3: Line 72. Please avoid generalized statements like "a hand-full" (or "plenty" line 85). Try to be at least somewhat quantitative regarding how many chronic phages are known. This is a fairly common strategy among archaeal viruses. 

      Thanks for your suggestion. Given that some filamentous phages also have a chronic life cycle that is not explicitly reported, we cannot accurately estimate their numbers. According to your suggestions, we have modified these descriptions as “however, to our best knowledge, only few phages have been described for prokaryotes in the pure isolates up to date (Roux et al., 2019; Alarcón-Schumacher et al., 2022; Liu et al., 2022).” in the revised manuscript (Lines 73-75). In addition, the number of chronic phages in the biosphere cannot be accurately estimated, according to the latest report (Chevallereau et al., 2022), which showed that “a large fraction of phages in the biosphere are produced through chronic life cycles”. Therefore, we have modified this description as “Therefore, a large percentage of phages in nature are proposed to replicate through chronic life cycles” in the revised manuscript (Lines 87-88). 

      Comment 4: Line 93. While Breitbart 2012 is a good paper to cite here, there have been several, much more advanced analysis of the oceans virome. https://doi.org/10.1016/j.cell.2019.03.040 is one example, but there are several others. A deeper literature review is required in this section.  

      Thanks for your valuable suggestions. We have added some literatures and modified this description as “A majority of these viruses are bacteriophages, which exist widely in oceans and affect the life activities of microbes (Breitbart, 2012; Roux et al., 2016; Gregory et al., 2019; Dominguez-Huerta et al., 2022).” in the revised manuscript (Lines 94-97). 

      References related to this response:

      Roux, S., Brum, J.R., Dutilh, B.E., Sunagawa, S., Duhaime, M.B., Loy, A., Poulos, B.T., Solonenko, N., Lara, E., Poulain, J., et al. (2016) Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537:689-693. 

      Gregory, A.C., Zayed, A.A., Conceição-Neto, N., Temperton, B., Bolduc, B., Alberti, A., Ardyna, M., Arkhipova, K., Carmichael, M., Cruaud, C., et al. (2019) Marine DNA Viral Macro- and Microdiversity from Pole to Pole. Cell 177:1109-1123.e1114. 

      Dominguez-Huerta, G., Zayed, A.A., Wainaina, J.M., Guo, J., Tian, F., Pratama, A.A., Bolduc, B., Mohssen, M., Zablocki, O., Pelletier, E., et al. (2022) Diversity and ecological footprint of Global Ocean RNA viruses. Science 376:1202-1208.

      Comment 5: Line 137. I see the phage upregulation in Figure 1, however in the text and figure it would be good to also elaborate on what the background expression generally looks like. Perhaps a transcriptomic read normalization and recruitment to the genome with a display of the coverage map, highlighting the prophage would be helpful. Are the polysacharides directly influencing phage induction or is there some potential for another cascading effect?  

      Thanks for your comments. We have elaborated all expressions of phage-associated genes under different conditions in the Supplementary Table 1, which showed that the background expressions were very low. The numbers in Fig. 1C were the gene expressions (by taking log2 values) of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin compared with the rich medium alone.

      In addition, our RT-qPCR results (Fig. 1D) also confirmed that these genes encoding phage-associated proteins were significantly upregulated when 10 g/L laminarin was added in the rich medium. According to your suggestions, we have modified this description as “In addition to the up-regulation of genes related to glycan transport and degradation, when 10 g/L laminarin was added in the rich medium, the most upregulated genes were phage-associated (e. g. phage integrase, phage portal protein) (Fig. 1C and Supplementary Table 1), which were expressed at the background level in the rich medium alone.” in the revised manuscript (Lines 136-140). Based on the present results, we speculate that polysaccharides might directly induce phage production, which needs to be verified by a large number of experiments in the future.

      Comment 6: Line 179. We need some assurance that phage was not introduced by your laminarin or starch supplement. Perhaps a check on the TEM/sequencing check of supplement itself would be helpful? This may be what is meant on Line 188 "without culturing bacterial cells" however this is not clearly worded if that is the case. Additional note, further reading reinforces this as a key concern. Many of the subsequent results are consistent with a contaminated starch stock. 

      Thanks for your comments. We are very sure that the phages are specifically derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. The reasons are as following: (1) we have performed multiple TEM checks of the rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), and there were not any phage-like structures, which confirmed that the polysaccharides (laminarin/starch) we used are not contaminated with any phage-like structures. In addition, we also observed the polysaccharides (laminarin/starch) directly by TEM and did not find any phage-like structures (Supplementary Fig. 2). According to your suggestions, we have modified this description as “We also tested and confirmed that there were not any phage-like structures in rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), ruling out the possibility of phage contamination from the polysaccharides (laminarin/ starch).” in the revised manuscript (Lines 158-162) and “Meanwhile, we also checked the polysaccharides (laminarin/ starch) in rich medium directly by TEM and did not find any phage-like structures (Supplementary Fig. 2).” in the revised manuscript (Lines 178-180). (2) the polysaccharide stock solution was strictly sterilized to remove any phage contamination. (3) the polysaccharide (starch) alone could not promote the growth of Pseudomonas stutzeri 273, however, the supplement of starch together with the extracted Phages-WC36 could effectively facilitate the growth of Pseudomonas stutzeri 273 (Response Figure 1). The above results clearly indicated the phage was derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. 

      In addition, given that polysaccharide was a kind of critical energy source for most microorganisms, we sought to ask whether polysaccharide also induces the production of bacteriophages in other deep-sea bacteria. To this end, we cultured deep-sea representatives from other four other phyla (including Chloroflexi, Tenericutes, Proteobacteria, and Actinobacteria) in the medium supplemented with laminarin/starch, and checked the supernatant of cells suspension through TEM as described above. We could not find any phage-like structures in these cells suspension (Author reaponse image 2), which also confirmed that there was no phage contamination in the polysaccharides.

      Author response image 2.

      Growth curve and status of Pseudomonas stutzeri 273 cultivated in basal medium, basal medium supplemented with 20 μl/mL Phages-WC36, basal medium supplemented with 5 g/L starch, basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36.   

      Author response image 3.

      TEM observation of the supernatant of cells suspension of a Chloroflexi strain, a Tenericutes strain, a Proteobacteria strain and an Actinobacteria strain that cultivated in the rich medium supplemented with 10 g/L laminarin and 10 g/L starch. No phage-like particles could be observed.  

      Comment 7: Line 223. Correct generalized wording "long time". 

      Thanks for your comments. We have changed “after for a long time” to “after 30 days” in the revised manuscript (Line 197).

      Comment 8: Line 229. Please more explicitly describe what these numbers are (counts of virion like structures - filamentous and hexagonal respectively?), the units (per µL?), and how these were derived. The word "around" should be replaced with mean and standard deviation values for each count from replicates, without which these are not meaningful.

      Thanks for your comments. The average numbers per microliter (µL) of filamentous and hexagonal phages in each condition were respectively calculated by randomly choosing ten TEM images. According to your suggestions, we have modified this description as “Specifically, the average number per microliter of filamentous phages (9.7, 29 or 65.3) extracted from the supernatant of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin for 5, 10 or 30 days was higher than that cultured in rich medium supplemented with 5 g/L laminarin (4.3, 13.7 or 35.3) (Fig. 3B). The average number per microliter of hexagonal phages (9, 30, 46.7) extracted from the supernatant of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin for 5, 10 or 30 days was higher than that cultured in rich medium supplemented with 5 g/L laminarin (4, 11.3 or 17.7) (Fig. 3C).” in the revised manuscript (Lines 203-210).

      Comment 9: Line 242. This section should be included in the discussion of Figure 2 - around line 194.

      Thanks. According to your suggestion, we have moved this section to the discussion corresponding to Figure 2 (Lines 183-191).

      Comment 10: Figure 3. Stay consistent in the types of figures generated per strain. Figure 3A should be a growth curve.

      Thanks for your comments. Actually, figure 3A was a growth curve, the corresponding description “(A) Growth curve of strain WC36 cultivated in either rich medium alone or rich medium supplemented with 5 g/L or 10 g/L laminarin for 30 days.” was shown in the Figure 3A legend in this manuscript.

      Comment 11: Line 312. Move the discussion of AMGs to after the discussion of the phage genome identification.

      Thanks for your valuable comments. According to your suggestions, we have moved the discussion of AMGs to after the discussion of the phage genome identification.

      Comment 12: Line 312. It would be informative to sequence in-bulk each of your treatments as opposed to just sequencing the viral isolates (starch and no host included) to see what viruses can be identified in each. ABySS is also not a common assembler for viral analysis. Is there literature to support it as a sufficient tool in assembling viral genomes? What sequencing depths were obtained in your samples?

      Thanks for your comments. In previous studies, we did sequence the starch or laminarin alone (no host included) and did not detect any phage-related sequences. The introduction of ABySS software was shown in these literatures (Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res. 2017 May;27(5):768-777; Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009 Jun;19(6):1117-23.), which were also used to assemble viral genomes in these literatures (Guo Y, Jiang T. First Report of Sugarcane Mosaic Virus Infecting Goose Grass in Shandong Province, China. Plant Dis. 2024 Mar 21. doi: 10.1094/PDIS-11-23-2514-PDN; Tang M, Chen Z, Grover CE, Wang Y, Li S, Liu G, Ma Z, Wendel JF, Hua J. Rapid evolutionary divergence of Gossypium barbadense and G. hirsutum mitochondrial genomes. BMC Genomics. 2015 Oct 12;16:770.). The sequencing depth of the phages of strain WC36 and zth2 were 350x and 365x, respectively.

      Comment 13: Line 323. Replace "eventually" with more detail about what was done to derive the genomes. Were these the only four sequences identified as viral?

      Thanks for your comments. We have used the ABySS software (http://www.bcgsc.ca/platform/bioinfo/software/abyss) to perform genome assembly with multiple-Kmer parameters. VIBRANT v1.2.1 (Kieft et al., 2020), DRAM-v (Shaffer et al., 2020), VirSorter v1.0.5 (with categories 1 (“pretty sure”) and 2 (“quite sure”)) (Roux et al., 2015) and VirFinder v1.1 (with statistically significant viral prediction: score > 0.9 and P-value < 0.05) (Ren et al., 2017) with default parameters were used to identify viral genomes from these assembly sequences by searching against the both cultured and non-cultured viral NCBI-RefSeq database (http://blast.ncbi.nlm.nih.gov/) and IMG/VR database (Camargo et al., 2023). The GapCloser software (https://sourceforge.net/projects/soapdenovo2/files/GapCloser/) was subsequently applied to fill up the remaining local inner gaps and correct the single base polymorphism for the final assembly results. All the detailed processes were described in the supplementary information. The virus sequences with higher scores are only these four, but they are not complete genomes. Some virus sequences with shorter sequences and lower scores were excluded.

      Comment 14: Line 328. We need some details about the host genomes here. How were these derived? What is their completeness/contamination? What is their size? If the bins are poor, these would not serve as a reliable comparison to identify integrated phage.

      Thanks for your comments. For genomic sequencing, strains WC36 and zth2 were grown in the liquid rich medium supplemented with 5 g/L laminarin and starch and harvested after one week of incubation at 28 °C. Genomic DNA was isolated by using the PowerSoil DNA isolation kit (Mo Bio Laboratories Inc., Carlsbad, CA). Thereafter, the genome sequencing was carried out with both the Illumina NovaSeq PE150 (San Diego, USA) and Nanopore PromethION platform (Oxford, UK) at the Beijing Novogene Bioinformatics Technology Co., Ltd. A complete description of the library construction, sequencing, and assembly was performed as previously described (Zheng et al., 2021). We used seven databases to predict gene functions, including Pfam (Protein Families Database, http://pfam.xfam.org/), GO (Gene Ontology, http://geneontology.org/) (Ashburner et al., 2000), KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/) (Kanehisa et al., 2004), COG (Clusters of Orthologous Groups, http://www.ncbi.nlm.nih.gov/COG/) (Galperin et al., 2015), NR (Non-Redundant Protein Database databases), TCDB (Transporter Classification Database), and Swiss-Prot (http://www.ebi.ac.uk/uniprot/) (Bairoch and Apweiler, 2000). A whole genome Blast search (E-value less than 1e-5, minimal alignment length percentage larger than 40%) was performed against above seven databases.

      The completeness of the genomes of strains WC36 and zth2 were 100%, which were checked by the CheckM v1.2.2. The size of the genome of strains WC36 and zth2 were 3,660,783 bp and 3,198,720bp, respectively. The complete genome sequences of strains WC36 and zth2 presented in this study have been deposited in the GenBank database with accession numbers CP085689 and CP071032, respectively. 

      Moreover, to verify whether the absence of microbial contamination in phage sequencing results, we used the new alignment algorithm BWA-MEM (version 0.7.15) to perform reads mapping of host WGS to these phages. We found that all the raw reads of host strains (WC36 and zth2) were not mapping to these phages sequences (Author response image 3, shown as below). In addition, we also performed the evaluation of the assembly graph underlying the host consensus assemblies. Clean reads were mapped to the bacterial complete genome sequences by the Bowtie 2 (version 2.5.0), BWA (version 0.7.8) and SAMTOOLS (version 0.1.18). The results showed that the total mismatch rate of strains WC36 and zth2 were almost 0% and 0.03%, respectively (Author response table 1, shown as below). In addition, we also collected the cells of strains WC36 and zth2, and then sent them to another company for whole genome sequencing (named WC36G and ZTH, GenBank accession numbers CP151801 and CP119760, respectively). The completeness of the genomes of strains WC36G and ZTH were also 100%. The size of the genome of strains WC36G and ZTH were 3,660,783bp and 3,198,714bp, respectively. The raw reads of strains WC36G and zth2 were also not mapping to the phages sequences. Therefore, we can confirm that these bacteriophage genomes were completely outside of the host chromosomes. 

      Author response image 4.

      The read mapping from WGS to phage sequences.

      Author response table 1.

      Sequencing depth and coverage statistics.

      References related to this response:

      Zheng, R., Liu, R., Shan, Y., Cai, R., Liu, G., and Sun, C. (2021b) Characterization of the first cultured free-living representative of Candidatus Izemoplasma uncovers its unique biology ISME J 15:2676-2691. 

      Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium Nat Genet 25:25-29. 

      Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M. (2004) The KEGG resource for deciphering the genome Nucleic Acids Res 32:D277-280. 

      Galperin, M.Y., Makarova, K.S., Wolf, Y.I., and Koonin, E.V. (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database Nucleic Acids Res 43:D261-269. 

      Bairoch, A., and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 Nucleic Acids Res 28:45-48.

      Comment 15: Line 333. This also needs some details. What evidence do you have that these are not chromosomal? If not chromosomal where can they be found? Sequencing efforts should also be able to yield extrachromosomal elements such as plasmids etc... If you were to sequence your purified isolate cultures from the rich media alone and include all assemblies (not just those binned for example) as a reference, would you be able to recruit viral reads? The way this reads suggests that Chevallereau et al., worked specifically with these phage, which is not the case - please rephrase.

      Thanks for your comments. We carefully compared the bacteriophage genomes with those of the corresponding hosts (strains WC36 and zth2) using Galaxy Version 2.6.0 (https://galaxy.pasteur.fr/) (Afgan et al., 2018) with the NCBI BLASTN method and used BWA-mem software for read mapping from host whole genome sequencing (WGS) to these bacteriophages. These analyses both showed that the bacteriophage genomes are completely outside of the host chromosomes. Therefore, we hypothesized that the phage genomes might exist in the host in the form similar to that of plasmid.

      Comment 16: Line 335. More to the point here that we need confirmation that these phages were not introduced in the polysaccharide treatment

      Thanks for your comments. Please find our answers for this concern in the responses for comment 1 of “weakness” part and comment 6 of “Recommendations For The Authors” part.

      Comment 17: Line 342. Lacking significant detail here. Phylogeny based on what gene(s), how were the alignments computed/refined, what model used etc..?

      Thanks for your comments. According to your suggestions, all the related information was shown in this section “Materials and methods” of this manuscript. The maximum likelihood phylogenetic tree of Phage-WC36-2 and Phage-zth2-2 was constructed based on the terminase large subunit protein (terL). These proteins used to construct the phylogenetic trees were all obtained from the NCBI databases. All the sequences were aligned by MAFFT version 7 (Katoh et al., 2019) and manually corrected. The phylogenetic trees were constructed using the W-IQ-TREE web server (http://iqtree.cibiv.univie.ac.at) with the “GTR+F+I+G4” model (Trifinopoulos et al., 2016). Finally, we used the online tool Interactive Tree of Life (iTOL v5) (Letunic and Bork, 2021) to edit the tree. 

      Comment 18: Line 346. How are you specifically defining AMGs in this study? Most of these are well-known and studied phage genes with specific life cycle functions and could not be considered as polysaccharide processing AMGs even though in host cells many do play a role in polysaccharide processing systems. A substantially deeper literature review is needed in this section, which would ultimately eliminate most of these from the potential AMG pools. Further, the simple HMM/BLASTp evalues are not sufficient to support the functional annotation of these genes. At a minimum, catalytic/conserved regions should be identified, secondary structures compared, and phylogenetic analysis (where possible) developed etc... My recommendation is to eliminate this section entirely from the manuscript. 

      Categorically:

      - Glycoside hydrolase (various families), glucosaminidases, and transglycosylase are all very common to phage and operate generally as a lysins, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection https://doi.org/10.3389/fmicb.2016.00745 (and citations therein) https://doi.org/10.1016/j.cmi.2023.10.018 etc... In order to confirm these as distinct AMGs we would need a very detailed analysis indicating that these are not phage infection cycle/host recognition related, however I strongly suspect that under such interrogation, these would prove to be as such.

      -TonB related systems including ExbB are well studied among phages as part of the trans-location step in infection. These could not be considered as AMGs. https://doi.org/10.1128/JB.00428-19. Other TonB dependent receptors play a role in host recognition.

      -Several phage acetyltransferases play a role in suppressing host RNA polymerase in order to reserve host cell resources for virion production, including polysaccharide production. https://doi.org/10.3390/v12090976. Further it has been shown that the E. coli gene neuO (O-acetyltransferase) is a homologue of lambdoid phage tail fiber genes https://doi.org/10.1073/pnas.0407428102. I suspect the latter is also the case here and this is a tail fiber gene.

      Thanks for your valuable comments. According to your suggestions, we have reanalyzed these AMGs and made some modifications (the new version Fig. 5A, shown as below). These genes encoding proteins associated with polysaccharide transport and degradation may be only common in virulent phages, and have never been reported in chronic phages. Unlike virulent phages, these genes typically act as lysozymes, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection, chronic phages do not lyse the host. It is reported that, filamentous phages could recognize and bind to the host pili, which causes the pili to shrink and brings the filamentous phages closer to and possibly through the outer membrane of host cells (Riechmann et al., 1997; Sun et al., 1987). The possible mechanism of other chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. It has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic manner (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Therefore, the persistence of these genes in chronic phages may be due to their ability to assist the host in metabolizing polysaccharides. 

      Finally, according to your suggestions, we have weakened the role of AMGs and added “potential” in front of it.

      References related to this response:

      Riechmann L, Holliger P. (1997) The C-terminal domain of TolA is the coreceptor for filamentous phage infection of E. coli Cell 90:351-60.

      Sun TP, Webster RE. (1987) Nucleotide sequence of a gene cluster involved in entry of E colicins and single-stranded DNA of infecting filamentous bacteriophages into Escherichia coli J Bacteriol 169:2667-74. 

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. (2022) Chronic release of tailless phage particles from Lactococcus lactis Appl Environ Microbiol 88: e0148321. da Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio_._ 2022;13:e0237522.

      Comment 19: Line 354. To make this statement that these genes are missing from the host, we would need to know that these genomes are complete.

      Thanks for your comments. The completeness of the genomes of strains WC36 and zth2 were 100%, which were checked by the CheckM v1.2.2. The size of the genome of strains WC36 and zth2 were 3,660,783 bp and 3,198,720bp, respectively. The complete genome sequences of strains WC36 and zth2 presented in this study have been deposited in the GenBank database with accession numbers CP085689 and CP071032, respectively. In addition, we also collected the cells of strains WC36 and zth2, and then sent it to another company for whole genome sequencing (named WC36G and ZTH, GenBank accession numbers CP151801 and CP119760, respectively). The completeness of the genomes of strains WC36G and ZTH were also 100%. The size of the genome of strains WC36G and ZTH were 3,660,783bp and 3,198,714bp, respectively. Therefore, these genomes of strains WC36 and zth2 were complete and circular.    

      Comment 20: Figure 5. Please see https://peerj.com/articles/11447/ and https://doi.org/10.1093/nar/gkaa621 for a detailed discussion on vetting AMGs. Several of these should be eliminated according to the standards set in the field. More specifically, and by anecdotal comparison with other inoviridae genomes, for Phage-WC36-1 and Phage-zth2-1, I am not convinced that the transactional regulator and glycoside hydrolase are a part of the phage genome. The phage genome probably ends at the strand switch.

      Thanks for your comments. According to your suggestions, we have analyzed these two articles carefully and modified the genome of Phage-WC36-1 and Phage-zth2-1 by anecdotal comparison with other inoviridae genomes. As you said, the transactional regulator and glycoside hydrolase are not a part of the phage genome.

      The new version Fig. 5A was shown.

      References related to this response:

      Shaffer, M., Borton, M.A., McGivern, B.B., Zayed, A.A., La Rosa, S.L., Solden, L.M., Liu, P., Narrowe, A.B., Rodrgíuez-Ramos, J., Bolduc, B., et al. (2020) DRAM for distilling microbial metabolism to automate the curation of microbiome function Nucleic Acids Res 48:8883-8900 

      Pratama, A.A., Bolduc, B., Zayed, A.A., Zhong, Z.P., Guo, J., Vik, D.R., Gazitúa, M.C., Wainaina, J.M., Roux, S., and Sullivan, M.B. (2021) Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation PeerJ 9:e11447

      Comment 21: Line 380. This section needs to start with detailed evidence that this phage can even infect this particular strain. Added note, upon further reading the serial dilution cultures are not sufficient to prove these phage infect this Pseudomonas. We need at a minimum a one-step growth curve and wet mount microscopy. It is much more likely that some carry over contaminant is invading the culture and influencing OD600. With the given evidence, I am not at all convinced that these phages have anything to do with Pseudomonas polysaccharide use and I recommend either drastically revising this section or eliminating it entirely.

      Line 386-389. Could this be because you are observing your added phage in the starch enriched media while no phage were introduced with the "other types of media" so none would be observed? This could have nothing to do with infection dynamics. Further, this would also be consistent with your starch solution being contaminated by phage.

      Line 399. Again consistent with the starch media being contaminated.

      Line 401-408. This is more likely to do with the augmentation of the media with an additional carbon source and not involving the phage. 

      Line 410. I am not convinced that these viruses infect the Pseudomonas strain. Extensive further evidence of infection is needed to make these assertions.  Figure 6A. We need confirmation that the isolate culture remains pure and there are no other contaminants introduced with the phage.

      Thanks for your comments. We have proved that the polysaccharides (laminarin/ starch) didn't contaminate any phages above. Actually, we selected many marine strains (Pseudomonadota, Planctomycetes, Verrucomicrobia, Fusobacteria, and Tenericutes isolates) to investigate whether Phages-WC36 could assist them in degradation and utilization of polysaccharides, and found that Phages-WC36 could only promote the growth of strain 273. The presence of filamentous phages and hexagonal phages was detected in the supernatant of strain 273 cultured in basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36. After 3 passages of serial cultivation in basal medium supplemented with 5 g/L starch, we found that filamentous phages and hexagonal phages were also present in basal medium supplemented with starch, but not in the basal medium, which may mean that Phages-WC36 could infect strain 273 and starch is an important inducer. In addition, the Phages-WC36 used in the growth assay of strain 273 were multiple purified and eventually suspended in SM buffer (0.01% gelatin, 50 mM Tris-HCl, 100 mM NaCl and 10 mM MgSO4). Thus, these phages are provided do not contain some extracellular enzymes and/or nutrients. In addition, we set up three control groups in the growth assay of strain 273: basal medium, basal medium supplemented with Phages-WC36 and basal medium supplemented with starch. If the Phages-WC36 contains some extracellular enzymes and/or nutrients, strain 273 could also grow well in the basal medium supplemented only with Phages-WC36. However, the poor growth results of strain 273 cultivated in the basal medium supplemented with Phages-WC36 further confirmed that there were not some extracellular enzymes and/or nutrients in these phages.

      Finally, the possible mechanism of the chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. Thus, these chronic phages may have a wider host range. However, we were unable to further disclose the infection mechanism in this paper. Therefore, according to your suggestions, we have deleted this section entirely.

      Comment 27: Line 460. Details about how these genomes were reconstructed is needed here.  

      Thanks for your comments. According to your suggestions, we have added the detailed information about the genome sequencing, annotation, and analysis as “Genome sequencing, annotation, and analysis of strains WC36 and zth2 For genomic sequencing, strains WC36 and zth2 were grown in the liquid rich medium supplemented with 5 g/L laminarin and starch and harvested after one week of incubation at 28 °C. Genomic DNA was isolated by using the PowerSoil DNA isolation kit (Mo Bio Laboratories Inc., Carlsbad, CA). Thereafter, the genome sequencing was carried out with both the Illumina NovaSeq PE150 (San Diego, USA) and Nanopore PromethION platform (Oxford, UK) at the Beijing Novogene Bioinformatics Technology Co., Ltd. A complete description of the library construction, sequencing, and assembly was performed as previously described (Zheng et al., 2021b). We used seven databases to predict gene functions, including Pfam (Protein Families Database, http://pfam.xfam.org/), GO (Gene Ontology, http://geneontology.org/) (Ashburner et al., 2000), KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/) (Kanehisa et al., 2004), COG (Clusters of Orthologous Groups, http://www.ncbi.nlm.nih.gov/COG/) (Galperin et al., 2015), NR (Non-Redundant Protein Database databases), TCDB (Transporter Classification Database), and Swiss-Prot (http://www.ebi.ac.uk/uniprot/) (Bairoch and Apweiler, 2000). A whole genome Blast search (E-value less than 1e-5, minimal alignment length percentage larger than 40%) was performed against above seven databases.” in the revised manuscript (Lines 333-351).

      Comment 28: Line 462. Accession list of other taxa in the supplement would help here.  

      Thanks for your comments. The accession numbers of these strains were displayed behind these strains in Figure 1A. According to your suggestions, we have added an accession list of these taxa (Supplementary Table 6) in the revised manuscript.

      Comment 29: Line 463. Is there any literature to support that these are phylogenetically informative genes for Inoviridae?  

      Thanks for your comments. There are some literatures (Zeng et al, 2021; Evseev et al, 2023) to support that these are phylogenetically informative genes for Inoviridae. We have added these literatures in the revised manuscript. 

      References related to this response:

      Zeng, J., Wang, Y., Zhang, J., Yang, S., and Zhang, W. (2021) Multiple novel filamentous phages detected in the cloacal swab samples of birds using viral metagenomics approach Virol J 18:240

      Evseev, P., Bocharova, J., Shagin, D., and Chebotar, I. (2023) Analysis of Pseudomonas aeruginosa isolates from patients with cystic fibrosis revealed novel groups of filamentous bacteriophages. Viruses 15: 2215

      Reviewer #2 (Public Review):

      Summary: This paper investigates virus-host interactions in deep-sea bacteriophage systems which employ a seemingly mutualistic approach to viral replication in which the virus aids host cell polysaccharide import and utilization via metabolic reprogramming. The hypothesis being tested is supported with solid and convincing evidence and the findings are potentially generalizable with implications for our understanding of polysaccharide-mediated virus-host interactions and carbon cycles in marine ecosystems more broadly.

      Thanks for your positive comments.

      Strengths: This paper synthesizes sequencing and phylogenic analyses of two Lentisphaerae bacteria and three phage genomes; electron microscopy imaging of bacterial/phage particles; differential gene expression analyses; differential growth curve analyses, and differential phage proliferation assays to extract insights into whether laminarin and starch can induce both host growth and phage proliferation. The data presented convincingly demonstrate that both host culture density and phage proliferation increase as a result having host, phage, and polysaccharide carbon source together in culture.

      Thanks for your positive comments.  

      Weaknesses (suggestions for improvement): 

      (1) The article would be strengthened by the following additional experiment: providing the phage proteins hypothesized to be aiding host cell growth (red genes from Figure 5...TonB system energizer ExbB, glycosidases, etc) individually or in combination on plasmids rather than within the context of the actual phage itself to see if such additional genes are necessary and sufficient to realize the boosts in host cell growth/saturation levels observed in the presence of the phages tested.

      Thanks for your valuable comments. It is a really good idea to express individually or in combination on plasmids to see the effects of those polysaccharide-degradation proteins in the host cell. However, at present, we failed to construct the genetic and expression system for the strictly anaerobic strain WC36, which hindering our further detailed investigation of the functions of those polysaccharide-degradation proteins. In our lab, we are trying our best to build the genetic and expression system for strain WC36. We will definitely test your idea in the future. 

      (2) The paper would also benefit from additional experiments focused on determining how the polysaccharide processing, transport, and metabolism genes are being used by the phages to either directly increase viral infection/replication or else to indirectly do so by supporting the growth of the host in a more mutualistic manner (i.e. by improving their ability to import, degrade, and metabolize polysaccharides).  

      Thanks for your valuable comments. Indeed, due to the chronic phage genome is not within the chromosome of the host, it is very hard to disclose the exact auxiliary process and mechanism of chronic phages. At present, we are trying to construct a genetic manipulation system for the strictly anaerobic host WC36, and we will gradually reveal this auxiliary mechanism in the future. In addition, combined with the reviewer 1’s suggestions, the focus of revised manuscript is to emphasize that polysaccharides induce deep-sea bacteria to release chronic phages, and most of the content of phage assisting host metabolism of polysaccharides has been deleted.

      (3) The introduction would benefit from a discussion of what is known regarding phage and/or viral entry pathways that utilize carbohydrate anchors during host entry. The discussion could also be improved by linking the work presented to the concept of "selfishness" in bacterial systems (see for instance Giljan, G., Brown, S., Lloyd, C.C. et al. Selfish bacteria are active throughout the water column of the ocean. ISME COMMUN. 3, 11 (2023) https://doi.org/10.1038/s43705-023-00219-7). The bacteria under study are gram negative and it was recently demonstrated (https://www.nature.com/articles/ismej201726) that "selfish" bacteria sequester metabolizable polysaccharides in their periplasm to advantage. It is plausible that the phages may be hijacking this "selfishness" mechanism to improve infectivity and ENTRY rather than helping their hosts to grow and profilerate so they can reap the benefits of simply having more hosts to infect. The current work does not clearly distinguish between these two distinct mechanistic possibilities. The paper would be strengthened by at least a more detailed discussion of this possibility as well as the author's rationale for interpreting their data as they do to favor the "mutualistic" interpretation. In the same light, the paper would benefit from a more careful choice of words which can also help to make such a distinction more clear/evident/intentional. As currently written the authors seem to be actively avoiding giving insights wrt this question.  

      Thanks for your valuable comments. According to your suggestions, we have added the related discussion as “Moreover, it was recently demonstrated that selfish bacteria, which were common throughout the water column of the ocean, could bind, partially hydrolyze, and transport polysaccharides into the periplasmic space without loss of hydrolysis products (Reintjes et al., 2017; Giljan et al., 2023). Based on our results, we hypothesized that these chronic phages might also enter the host through this “selfishness” mechanism while assisting the host in metabolizing polysaccharides, thus not lysing the host. On the other hand, these chronic phages might hijack this “selfishness” mechanism to improve their infectivity and entry, rather than helping their hosts to grow and proliferate, so they could reap the benefits of simply having more hosts to infect. In the future, we need to construct a genetic operating system of the strictly anaerobic host strain WC36 to detailedly reveal the relationship between chronic phage and host.” in the revised manuscript (Lines 305-316). 

      References related to this response:

      Reintjes, G., Arnosti, C., Fuchs, B.M., and Amann, R. (2017) An alternative polysaccharide uptake mechanism of marine bacteria ISME J 11:1640-1650

      Giljan, G., Brown, S., Lloyd, C.C., Ghobrial, S., Amann, R., and Arnosti, C. (2023) Selfish bacteria are active throughout the water column of the ocean ISME Commun 3:11

      (4) Finally, I would be interested to know if the author’s sequencing datasets might be used to inform the question raised above by using bacterial immunity systems such as CRISPR/Cas9. For example, if the phage systems studied are truly beneficial/mutualistic for the bacteria then it’s less likely that there would be evidence of targeted immunity against that particular phage that has the beneficial genes that support polysaccharide metabolism.

      Thanks for your comments. According to your suggestions, we have carefully analyzed the genome of strain WC36, and found that there were no CRISPR/Cas9-related genes. Considering our results that the number of chronic phages was increased with the prolongation of culture time, we speculated that host might have no targeted immunity against these chronic phages.

      Reviewer #2 (Recommendations For The Authors):

      There are some minor grammatical errors and unclear statements (lines 99-100, 107-109, 163, 222, 223, 249-250, 254) which should also be fixed before final publication. 

      Thanks for your valuable comments. We have fixed these minor grammatical errors and unclear statements in the revised manuscript.

      Lines 99-100: we have modified this description as “For instance, AMGs of marine bacteriophages have been predicted to be involved in photosynthesis (Mann et al., 2003), nitrogen cycling (Ahlgren et al., 2019; Gazitúa et al., 2021), sulfur cycling (Anantharaman et al., 2014; Roux et al., 2016), phosphorus cycling (Zeng and Chisholm, 2012), nucleotide metabolism (Sullivan et al., 2005; Dwivedi et al., 2013; Enav et al., 2014), and almost all central carbon metabolisms in host cells (Hurwitz et al., 2013).” in the revised manuscript (Lines 100-105).

      Lines 107-109: we have modified this description as “However, due to the vast majority of deep-sea microbes cannot be cultivated in the laboratory, most bacteriophages could not be isolated.” in the revised manuscript (Lines 110-111).

      Line 163: we have modified this description as “Based on the growth curve of strain WC36, we found that the growth rate of strictly anaerobic strain WC36 was relatively slow.” in the revised manuscript (Lines 149-151).

      Lines 222-223: we have modified this description as “Regardless of whether the laminarin was present, the bacterial cells kept their cell shape intact, indicating they were still healthy after 30 days” in the revised manuscript (Lines 195-197).

      Lines 249-250: we have modified this description as “However, the entry and exit of the hexagonal phages into the WC36 cells were not observed.” in the revised manuscript (Lines 190-191).

      Line 254: we have modified this description as “To explore whether the production of bacteriophages induced by polysaccharide is an individual case, we further checked the effect of polysaccharides on another cultured deep-sea Lentisphaerae strain zth2.” in the revised manuscript (Lines 213-215).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      Thank you for your review and pointing out multiple things to be discussed and clarified! Below, we go through the various limitations you pointed out and refer to the places where we have tried to address them.

      (1) It's important to keep in mind that this work involves simplified models of the motor system, and often the terminology for 'motor cortex' and 'models of motor cortex' are used interchangeably, which may mislead some readers. Similarly, the introduction fails in many cases to state what model system is being discussed (e.g. line 14, line 29, line 31), even though these span humans, monkeys, mice, and simulations, which all differ in crucial ways that cannot always be lumped together.

      That is a good point. We have clarified this in the text (Introduction and Discussion), to highlight the fact that our model isn’t necessarily meant to just capture M1. We have also updated the introduction to make it more clear which species the experiments which motivate our investigation were performed in.

      (2) At multiple points in the manuscript thalamic inputs during movement (in mice) is used as a motivation for examining the role of preparation. However, there are other more salient motivations, such as delayed sensory feedback from the limb and vision arriving in the motor cortex, as well as ongoing control signals from other areas such as the premotor cortex.

      Yes – the motivation for thalamic inputs came from the fact that those have specifically been shown to be necessary for accurate movement generation in mice. However, it is true that the inputs in our model are meant to capture any signals external to the dynamical system modeled, and as such are likely to represent a mixture of sensory signals, and feedback from other areas. We have clarified this in the Discussion, and have added this additional motivation in the Introduction.

      (3) Describing the main task in this work as a delayed reaching task is not justified without caveats (by the authors' own admission: line 687), since each network is optimized with a fixed delay period length. Although this is mentioned to the reader, it's not clear enough that the dynamics observed during the delay period will not resemble those in the motor cortex for typical delayed reaching tasks.

      Yes, we completely agree that the terminology might be confusing. While the task we are modeling is a delayed reaching task, it does differ from the usual setting since the network has knowledge of the delay period, and that is indeed a caveat of the model. We have added a brief paragraph just after the description of the optimal control objective to highlight this limitation.

      We have also performed additional simulations using two different variants of a model-predictive control approach that allow us to relax the assumption that the go-cue time is known in advance. We show that these modifications of the optimal controller yield results that remain consistent with our main conclusions, and can in fact in some settings lead to preparatory activity plateaus during the preparation epoch as often found in monkey M1 (e.g in Elsayed et al. 2016). We have modified the Discussion to explain these results and their limitations, which are summarized in a new Supplementary Figure (S9).

      (4) A number of simplifications in the model may have crucial consequences for interpretation.

      a) Even following the toy examples in Figure 4, all the models in Figure 5 are linear, which may limit the generalisability of the findings.

      While we agree that linear models may be too simplistic, much prior analyses of M1 data suggest that it is often good enough to capture key aspects of M1 dynamics; for example, the generative model underlying jPCA is linear, and Sussillo et al. (2015) showed that the internal activity of nonlinear RNN models trained to reproduce EMG data aligned best with M1 activity when heavily regularized; in this regime, the RNN dynamics were close to linear. Nevertheless, this linearity assumption is indeed convenient from a modeling viewpoint: the optimal control problem is more easily solved for linear network dynamics and the optimal trajectories are more consistent across networks. Indeed, we had originally attempted to perform the analyses of Figure 5 in the nonlinear setting, but found that while the results were overall similar to what we report in the linear regime, iLQR was occasionally trapped into local minimal, resulting in more variable results especially for inhibition-stabilized network in the strongly connected end of the spectrum. Finally, Figure 5 is primarily meant to explore to what extent motor preparation can be predicted from basic linear control-theoretic properties of the Jacobian of the dynamics; in this regard, it made sense to work with linear RNNs (for which the Jacobian is constant).

      b) Crucially, there is no delayed sensory feedback in the model from the plant. Although this simplification is in some ways a strength, this decision allows networks to avoid having to deal with delayed feedback, which is a known component of closed-loop motor control and of motor cortex inputs and will have a large impact on the control policy.

      This comment resonates well with Reviewer 3's remark regarding the autonomous nature (or not) of M1 during movement. Rather than thinking of our RNN models as anatomically confined models of M1 alone, we think of them as models of the dynamics which M1 implements possibly as part of a broader network involving “inter-area loops and (at some latency) sensory feedback”, and whose state appears to be near-fully decodable from M1 activity alone. We have added a paragraph of Discussion on this important point.

      (5) A key feature determining the usefulness of preparation is the direction of the readout dimension. However, all readouts had a similar structure (random Gaussian initialization). Therefore, it would be useful to have more discussion regarding how the structure of the output connectivity would affect preparation, since the motor cortex certainly does not follow this output scheme.

      We agree with this limitation of our model — indeed one key message of Figure 4 is that the degree of reliance on preparatory inputs depends strongly on how the dynamics align with the readout. However, this strong dependence is somewhat specific to low-dimensional models; in higher-dimensional models (most of our paper), one expects that any random readout matrix C will pick out activity dimensions in the RNN that are sufficiently aligned with the most controllable directions of the dynamics to encourage preparation.

      We did consider optimizing C away (which required differentiating through the iLQR optimizer, which is possible but very costly), but the question inevitably arises what exactly should C be optimized for, and under what constraints (e.g fixed norm or not). One possibility is to optimize C with respect to the same control objective that the control inputs are optimized for, and constrain its norm (otherwise, inputs to the M1 model, and its internal activity, could become arbitrarily small as C can grow to compensate). We performed this experiment (new Supplementary Figure S7) and obtained a similar preparation index; there was one notable difference, namely that the optimized readout modes led to greater observability compared to a random readout; thus, the same amount of “muscle energy” required for a given movement could now be produced by a smaller initial condition. In turn, this led to smaller control inputs, consistent with a lower control cost overall.

      Whilst we could have systematically optimized C away, we reasoned that (i) it is computationally expensive, and (ii) the way M1 affects downstream effectors is presumably “optimized” for much richer motor tasks than simple 2D reaching, such that optimizing C for a fixed set of simple reaches could lead to misleading conclusions. We therefore decided to stick with random readouts.

      Additional comments:

      (1) The choice of cost function seems very important. Is it? For example, penalising the square of u(t) may produce very different results than penalising the absolute value.

      Yes, the choice of cost function does affect the results, at least qualitatively. The absolute value of the inputs is a challenging cost to use, as iLQR relies on a local quadratic approximation of the cost function. However, we have included additional experiments in which we penalized the squared derivative of the inputs (Supplementary Figure S8; see also our response to Reviewer 3's suggestion on this topic), and we do see differences in the qualitative behavior of the model (though the main takeaway, i.e. the reliance on preparation, continues to hold). This is now referred to and discussed in the Discussion section.

      (2) In future work it would be useful to consider the role of spinal networks, which are known to contribute to preparation in some cases (e.g. Prut and Fetz, 1999).

      (3) The control signal magnitude is penalised, but not the output torque magnitude, which highlights the fact that control in the model is quite different from muscle control, where co-contraction would be a possibility and therefore a penalty of muscle activation would be necessary. Future work should consider the role of these differences in control policy.

      Thank you for pointing us to this reference! Regarding both of these concerns, we agree that the model could be greatly improved and made more realistic in future work (another avenue for this would be to consider a more realistic biophysical model, e.g. using the MotorNet library). We hope that the current Discussion, which highlights the various limitations of our modeling choices, makes it clear that a lot of these choices could easily be modified depending on the specific assumptions/investigation being performed.

      Reviewer 2:

      Thank you for your positive review! We very much agree with the limitations you pointed out, some of which overlapped with the comments of the other reviewers. We have done our best to address them through additional discussion and new supplementary figures. We briefly highlight below where those changes can be found.

      (1) Though the optimal control theory framework is ideal to determine inputs that minimize output error while regularizing the input norm, it however cannot easily account for some other varied types of objectives especially those that may lead to a complex optimization landscape. For instance, the reusability of parts of the circuit, sparse use of additional neurons when learning many movements, and ease of planning (especially under uncertainty about when to start the movement), may be alternative or additional reasons that could help explain the preparatory activity observed in the brain. It is interesting to note that inputs that optimize the objective chosen by the authors arguably lead to a trade-off in terms of other desirable objectives. Specifically, the inputs the authors derive are time-dependent, so a recurrent network would be needed to produce them and it may not be easy to interpolate between them to drive new movement variants. In addition, these inputs depend on the desired time of output and therefore make it difficult to plan, e.g. in circumstances when timing should be decided depending on sensory signals. Finally, these inputs are specific to the full movement chain that will unfold, so they do not permit reuse of the inputs e.g. in movement sequences of different orders.

      Yes, that is a good point! We have incorporated further Discussion related to this point. We have additionally included a new example in which we regularize the temporal complexity of the inputs (see also our response to Reviewer 3's suggestion on this topic), which leads to more slowly varying inputs, and may indeed represent a more realistic constraint and lead to simpler inputs that can more easily be interpolated between. We also agree that uncertainty about the upcoming go cue may play an important role in the strategy adopted by the animals. While we have not performed an extensive investigation of the topic, we have included a Supplementary Figure (S9) in which we used Model Predictive Control to investigate the effect of planning under uncertainty about the go cue arrival time. We hope that this will give the reader a better sense of what sort of model extensions are possible within our framework.

      (2) Relatedly, if the motor circuits were to balance different types of objectives, the activity and inputs occurring before each movement may be broken down into different categories that may each specialize into one objective. For instance, previous work (Kaufman et al. eNeuron 2016, Iganaki et al., Cell 2022, Zimnik and Churchland, Nature Neuroscience 2021) has suggested that inputs occurring before the movement could be broken down into preparatory inputs 'stricto sensu' - relating to the planned characteristics of the movement - and a trigger signal, relating to the transition from planning to execution - irrespective of whether the movement is internally timed or triggered by an external event. The current work does not address which type(s) of early input may be labeled as 'preparatory' or may be thought of as a part of 'planning' computations.

      Yes, our model does indeed treat inputs in a very general way, and does not distinguish between the different types of processes they may be composed of. This is partly because we do not explicitly model where the inputs come from, such that our inputs likely englobe multiple processes. We have added discussion related to this point.

      (3) While the authors rightly point out some similarities between the inputs that they derive and observed preparatory activity in the brain, notably during motor sequences, there are also some differences. For instance, while both the derived inputs and the data show two peaks during sequences, the data reproduced from Zimnik and Churchland show preparatory inputs that have a very asymmetric shape that really plummets before the start of the next movement, whereas the derived inputs have larger amplitude during the movement period - especially for the second movement of the sequence. In addition, the data show trigger-like signals before each of the two reaches. Finally, while the data show a very high correlation between the pattern of preparatory activity of the second reach in the double reach and compound reach conditions, the derived inputs appear to be more different between the two conditions. Note that the data would be consistent with separate planning of the two reaches even in the compound reach condition, as well as the re-use of the preparatory input between the compound and double reach conditions. Therefore, different motor sequence datasets - notably, those that would show even more coarticulation between submovements - may be more promising to find a tight match between the data and the author's inputs. Further analyses in these datasets could help determine whether the coarticulation could be due to simple filtering by the circuits and muscles downstream of M1, planning of movements with adjusted curvature to mitigate the work performed by the muscles while permitting some amount of re-use across different sequences, or - as suggested by the authors - inputs fully tailored to one specific movement sequence that maximize accuracy and minimize the M1 input magnitude.

      Regarding the exact shape of the occupancy plots, it is important to note that some of the more qualitative aspects (e.g the relative height of the two peaks) will change if we change the parameters of the cost function. Right now, we have chosen the parameters to ensure that both reaches would be performed at roughly the same speed (as a way to very loosely constrain the parameters based on the observed behavior). However, small changes to the hyperparameters can lead to changes in the model output (e.g one of the two consecutive reaches being performed using greater acceleration than the other), and since our biophysical model is fairly simple, changes in the behavior are directly reflected in the network activity. Essentially, what this means is that while the double occupancy is a consistent feature of the model, the exact shape of the peaks is more sensitive to hyperparameters, and we do not wish to draw any strong conclusions from them, given the simplicity of the biophysical model. However, we do agree that our model exhibits some differences with the data. As discussed above, we have included additional discussion regarding the potential existence of separate inputs for planning vs triggering the movement in the context of single reaches.

      Overall, we are excited about the suggestions made by the Reviewer here about using our approach to analyze other motor sequence datasets, but we think that in order to do this properly, one would need to adopt a more realistic musculo-skeletal model (such as one provided by MotorNet).

      (4) Though iLQR is a powerful optimization method to find inputs optimizing the author's cost function, it also has some limitations. First, given that it relies on a linearization of the dynamics at each timestep, it has a limited ability to leverage potential advantages of nonlinearities in the dynamics. Second, the iLQR algorithm is not a biologically plausible learning rule and therefore it might be difficult for the brain to learn to produce the inputs that it finds. It remains unclear whether using alternative algorithms with different limitations - for instance, using variants of BPTT to train a separate RNN to produce the inputs in question - could impact some of the results.

      We agree that our choice of iLQR has limitations: while it offers the advantage of convergence guarantees, it does indeed restrict the choice of cost function and dynamics that we can use. We have now included extensive discussion of how the modeling choices affect our results.

      We do not view the lack of biological plausibility of iLQR as an issue, as the results are agnostic to the algorithm used for optimization. However, we agree that any structure imposed on the inputs (e.g by enforcing them to be the output of a self-contained dynamical system) would likely alter the results. A potentially interesting extension of our model would be to do just what the reviewer suggested, and try to learn a network that can generate the optimal inputs. However, this is outside the scope of our investigation, as it would then lead to new questions (e.g what brain region would that other RNN represent?).

      (5)  Under the objective considered by the authors, the amount of input occurring before the movement might be impacted by the presence of online sensory signals for closed-loop control. It is therefore an open question whether the objective and network characteristics suggested by the authors could also explain the presence of preparatory activity before e.g. grasping movements that are thought to be more sensory-driven (Meirhaeghe et al., Cell Reports 2023).

      It is true that we aren’t currently modeling sensory signals explicitly. However, some of the optimal inputs we infer may be capturing upstream information which could englobe some sensory information. This is currently unclear, and would likely depend on how exactly the model is specified. We have added new discussion to emphasize that our dynamics should not be understood as just representing M1, but more general circuits whose state can be decoded from M1.

      Reviewer #2 (Recommendations For The Authors):

      Additionally, thank you for pointing out various typos in the manuscript, we have fixed those!

      Reviewer 3:

      Thank you very much for your review, which makes a lot of very insightful points, and raises several interesting questions. In summary, we very much agree with the limitations you pointed out. In particular, the choice of input cost is something we had previously discussed, but we had found it challenging to decide on what a reasonable cost for “complexity” could be. Following your comment, we have however added a first attempt at penalizing “temporal complexity”, which shows promising behavior. We have only included those additional analyses as supplementary figures, and we have included new discussion, which hopefully highlights what we meant by the different model components, and how the model behavior may change as we vary some of our choices. We hope this can be informative for future models that may use a similar approach. Below, we highlight the changes that we have made to address your comments.

      The main limitation of the study is that it focuses exclusively on one specific constraint - magnitude - that could limit motor-cortex inputs. This isn't unreasonable, but other constraints are at least as likely, if less mathematically tractable. The basic results of this study will probably be robust with regard such issues - generally speaking, any constraint on what can be delivered during execution will favor the strategy of preparing - but this robustness cuts both ways. It isn't clear that the constraint used in the present study - minimizing upstream energy costs - is the one that really matters. Upstream areas are likely to be limited in a variety of ways, including the complexity of inputs they can deliver. Indeed, one generally assumes that there are things that motor cortex can do that upstream areas can't do, which is where the real limitations should come from. Yet in the interest of a tractable cost function, the authors have built a system where motor cortex actually doesn't do anything that couldn't be done equally well by its inputs. The system might actually be better off if motor cortex were removed. About the only thing that motor cortex appears to contribute is some amplification, which is 'good' from the standpoint of the cost function (inputs can be smaller) but hardly satisfying from a scientific standpoint.

      The use of a term that punishes the squared magnitude of control signals has a long history, both because it creates mathematical tractability and because it (somewhat) maps onto the idea that one should minimize the energy expended by muscles and the possibility of damaging them with large inputs. One could make a case that those things apply to neural activity as well, and while that isn't unreasonable, it is far from clear whether this is actually true (and if it were, why punish the square if you are concerned about ATP expenditure?). Even if neural activity magnitude an important cost, any costs should pertain not just to inputs but to motor cortex activity itself. I don't think the authors really wish to propose that squared input magnitude is the key thing to be regularized. Instead, this is simply an easily imposed constraint that is tractable and acts as a stand-in for other forms of regularization / other types of constraints. Put differently, if one could write down the 'true' cost function, it might contain a term related to squared magnitude, but other regularizing terms would by very likely to dominate. Using only squared magnitude is a reasonable way to get started, but there are also ways in which it appears to be limiting the results (see below).

      I would suggest that the study explore this topic a bit. Is it possible to use other forms of regularization? One appealing option is to constrain the complexity of inputs; a long-standing idea is that the role of motor cortex is to take relatively simple inputs and convert them to complex time-evolving inputs suitable for driving outputs. I realize that exploring this idea is not necessarily trivial. The right cost-function term is not clear (should it relate to low-dimensionality across conditions, or to smoothness across time?) and even if it were, it might not produce a convex cost function. Yet while exploring this possibility might be difficult, I think it is important for two reasons.

      First, this study is an elegant exploration of how preparation emerges due to constraints on inputs, but at present that exploration focuses exclusively on one constraint. Second, at present there are a variety of aspects of the model responses that appear somewhat unrealistic. I suspect most of these flow from the fact that while the magnitude of inputs is constrained, their complexity is not (they can control every motor cortex neuron at both low and high frequencies). Because inputs are not complexity-constrained, preparatory activity appears overly complex and never 'settles' into the plateaus that one often sees in data. To be fair, even in data these plateaus are often imperfect, but they are still a very noticeable feature in the response of many neurons. Furthermore, the top PCs usually contain a nice plateau. Yet we never get to see this in the present study. In part this is because the authors never simulate the situation of an unpredictable delay (more on this below) but it also seems to be because preparatory inputs are themselves strongly time-varying. More realistic forms of regularization would likely remedy this.

      That is a very good point, and it mirrors several concerns that we had in the past. While we did focus on the input norm for the sake of simplicity, and because it represents a very natural way to regularize our control solutions, we agree that a “complexity cost” may be better suited to models of brain circuits. We have addressed this in a supplementary investigation. We chose to focus on a cost that penalizes the temporal complexity of the inputs, as ||u(t+1) - u(t)||^2. Note that this required augmenting the state of the model, making the computations quite a bit slower; while it is doable if we only penalize the first temporal derivative, it would not scale well to higher orders.

      Interestingly, we did find that the activity in that setting was somewhat more realistic (see new Supplementary Figure S8), with more sustained inputs and plateauing activity. While we have kept the original model for most of the investigations, the somewhat more realistic nature of the results under that setting suggests that further exploration of penalties of that sort could represent a promising avenue to improve the model.

      We also found the idea of a cost that would ensure low-dimensionality of the inputs across conditions very interesting. However, it is challenging to investigate with iLQR as we perform the optimization separately for each condition; nevertheless, it could be investigated using a different optimizer.

      At present, it is also not clear whether preparation always occurs even with no delay. Given only magnitude-based regularization, it wouldn't necessarily have to be. The authors should perform a subspace-based analysis like that in Figure 6, but for different delay durations. I think it is critical to explore whether the model, like monkeys, uses preparation even for zero-delay trials. At present it might or might not. If not, it may be because of the lack of more realistic constraints on inputs. One might then either need to include more realistic constraints to induce zero-delay preparation, or propose that the brain basically never uses a zero delay (it always delays the internal go cue after the preparatory inputs) and that this is a mechanism separate from that being modeled.

      I agree with the authors that the present version of the model, where optimization knows the exact time of movement onset, produces a reasonably realistic timecourse of preparation when compared to data from self-paced movements. At the same time, most readers will want to see that the model can produce realistic looking preparatory activity when presented with an unpredictable delay. I realize this may be an optimization nightmare, but there are probably ways to trick the model into optimizing to move soon, but then forcing it to wait (which is actually what monkeys are probably doing). Doing so would allow the model to produce preparation under the circumstances where most studies have examined it. In some ways this is just window-dressing (showing people something in a format they are used to and can digest) but it is actually more than that, because it would show that the model can produce a reasonable plateau of sustained preparation. At present it isn't clear it can do this, for the reasons noted above. If it can't, regularizing complexity might help (and even if this can't be shown, it could be discussed).

      In summary, I found this to be a very strong study overall, with a conceptually timely message that was well-explained and nicely documented by thorough simulations. I think it is critical to perform the test, noted above, of examining preparatory subspace activity across a range of delay durations (including zero) to see whether preparation endures as it does empirically. I think the issue of a more realistic cost function is also important, both in terms of the conceptual message and in terms of inducing the model to produce more realistic activity. Conceptually it matters because I don't think the central message should be 'preparation reduces upstream ATP usage by allowing motor cortex to be an amplifier'. I think the central message the authors wish to convey is that constraints on inputs make preparation a good strategy. Many of those constraints likely relate to the fact that upstream areas can't do things that motor cortex can do (else you wouldn't need a motor cortex) and it would be good if regularization reflected that assumption. Furthermore, additional forms of regularization would likely improve the realism of model responses, in ways that matter both aesthetically and conceptually. Yet while I think this is an important issue, it is also a deep and tricky one, and I think the authors need considerable leeway in how they address it. Many of the cost-function terms one might want to use may be intractable. The authors may have to do what makes sense given technical limitations. If some things can't be done technically, they may need to be addressed in words or via some other sort of non-optimization-based simulation.

      Specific comments

      As noted above, it would be good to show that preparatory subspace activity occurs similarly across delay durations. It actually might not, at present. For a zero ms delay, the simple magnitude-based regularization may be insufficient to induce preparation. If so, then the authors would either have to argue that a zero delay is actually never used internally (which is a reasonable argument) or show that other forms of regularization can induce zero-delay preparation.

      Yes, that is a very interesting analysis to perform, which we had not considered before! When investigating this, we found that the zero-delay strategy does not rely on preparation in the same way as is seen in the monkeys. This seems to be a reflection of the fact that our “Go cue” corresponds to an “internal” go cue which would likely come after the true, “external go cue” – such that we would indeed never actually be in the zero delay setting. This is not something we had addressed (or really considered) before, although we had tried to ensure we referred to “delta prep” as the duration of the preparatory period but not necessarily the delay period. We have now included more discussion on this topic, as well as a new Supplementary Figure S10.

      I agree with the authors that prior modeling work was limited by assuming the inputs to M1, which meant that prior work couldn't address the deep issue (tackled here) of why there should be any preparatory inputs at all. At the same time, the ability to hand-select inputs did provide some advantages. A strong assumption of prior work is that the inputs are 'simple', such that motor cortex must perform meaningful computations to convert them to outputs. This matters because if inputs can be anything, then they can just be the final outputs themselves, and motor cortex would have no job to do. Thus, prior work tried to assume the simplest inputs possible to motor cortex that could still explain the data. Most likely this went too far in the 'simple' direction, yet aspects of the simplicity were important for endowing responses with realistic properties. One such property is a large condition-invariant response just before movement onset. This is a very robust aspect of the data, and is explained by the assumption of a simple trigger signal that conveys information about when to move but is otherwise invariant to condition. Note that this is an implicit form of regularization, and one very different from that used in the present study: the input is allowed to be large, but constrained to be simple. Preparatory inputs are similarly constrained to be simple in the sense that they carry only information about which condition should be executed, but otherwise have little temporal structure. Arguably this produces slightly too simple preparatory-period responses, but the present study appears to go too far in the opposite direction. I would suggest that the authors do what they can to address these issue via simulations and/or discussion. I think it is fine if the conclusion is that there exist many constraints that tend to favor preparation, and that regularizing magnitude is just one easy way of demonstrating that. Ideally, other constraints would be explored. But even if they can't be, there should be some discussion of what is missing - preparatory plateaus, a realistic condition-invariant signal tied to movement onset - under the present modeling assumptions.

      As described above, we have now included two additional figures. In the first one (S8, already discussed above), we used a temporal smoothness prior, and we indeed get slightly more realistic activity plateaus. In a second supplementary figure (S9), we have also considered using model predictive control (MPC) to optimize the inputs under an uncertain go cue arrival time. There, we found that removing the assumption that the delay period is known came with new challenges: in particular, it requires the specification of a “mental model” of when the Go cue will arrive. While it is reasonable to expect that monkeys will have a prior over the go time arrival cue that will be shaped by the design of the experiment, some assumptions must be made about the utility functions that should be used to weigh this prior. For instance, if we imagine that monkeys carry a model of the possible arrival time of the go cue that is updated online, they could nonetheless act differently based on this information, for instance by either preparing so as to be ready for the earliest go cue possible or alternatively to be ready for the average go cue. This will likely depend on the exact task design and reward/penalty structure. Here, we added simulations with those two cases (making simplifying assumptions to make the problem tractable/solvable using model predictive control), and found that the “earliest preparation” strategy gives rise to more realistic plateauing activity, while the model where planning is done for the “most likely go time” does not. We suspect that more realistic activity patterns could be obtained by e.g combining this framework with the temporal smoothness cost. However, the main point we wished to make with this new supplementary figure is that it is possible to model the task in a slightly more realistic way (although here it comes at the cost of additional model assumptions). We have now added more discussion related to those points. Note that we have kept our analyses on these new models to a minimum, as the main takeaway we wish to convey from them is that most components of the model could be modified/made more realistic. This would impact the qualitative behavior of the system and match to data but – in the examples we have so far considered – does not appear to modify the general strategy of networks relying on preparation.

      On line 161, and in a few other places, the authors cite prior work as arguing for "autonomous internal dynamics in M1". I think it is worth being careful here because most of that work specifically stated that the dynamics are likely not internal to M1, and presumably involve inter-area loops and (at some latency) sensory feedback. The real claim of such work is that one can observe most of the key state variables in M1, such that there are periods of time where the dynamics are reasonably approximated as autonomous from a mathematical standpoint. This means that you can estimate the state from M1, and then there is some function that predicts the future state. This formal definition of autonomous shouldn't be conflated with an anatomical definition.

      Yes, that is a good point, thank you for making it so clearly! Indeed, as previous work, we do not think of our “M1 dynamics” as being internal to M1, but they may instead include sensory feedback / inter-area loops, which we summarize into the connectivity, that we chose to have dynamics that qualitatively resemble data. We have now incorporated more discussion regarding what exactly the dynamics in our model represent.

      Round 2 of reviews

      Reviewer 3:

      My remaining comments largely pertain to some subtle (but to me important) nuances at a few locations in the text. These should be easy for the authors to address, in whatever way they see fit.

      Specific comments:

      (1) The authors state the following on line 56: "For preparatory processes to avoid triggering premature movement, any pre-movement activity in the motor and dorsal pre-motor (PMd) cortices must carefully exclude those pyramidal tract neurons."

      This constraint is overly restrictive. PT neurons absolutely can change their activity during preparation in principle (and appear to do so in practice). The key constraint is looser: those changes should have no net effect on the muscles. E.g., if d is the vector of changes in PT neuron firing rates, and b is the vector of weights, then the constraint is that b'd = 0. d = 0 is one good way of doing this, but only one. Half the d's could go up and half could go down. Or they all go up, but half the b's are negative. Put differently, there is no reason the null space has to be upstream of the PT neurons. It could be partly, or entirely, downstream. In the end, this doesn't change the point the authors are making. It is still the case that d has to be structured to avoid causing muscle activity, which raises exactly the point the authors care about: why risk this unless preparation brings benefits? However, this point can be made with a more accurate motivation. This matters, because people often think that a null-space is a tricky thing to engineer, when really it is quite natural. With enough neurons, preparing in the null space is quite simple.

      That is a good point – we have now reformulated this sentence to instead say “to avoid triggering premature movement, any pre-movement activity in the motor and dorsal premotor (PMd) cortices must engage the pyramidal tract neurons in a way that ensures their activity patterns will not lead to any movement”.

      (2) Line 167: 'near-autonomous internal dynamics in M1'.

      It would be good if such statements, early in the paper, could be modified to reflect the fact that the dynamics observed in M1 may depend on recurrence that is NOT purely internal to M1. A better phrase might be 'near-autonomous dynamics that can be observed in M1'. A similar point applies on line 13. This issue is handled very thoughtfully in the Discussion, starting on line 713. Obviously it is not sensible to also add multiple sentences making the same point early on. However, it is still worth phrasing things carefully, otherwise the reader may have the wrong impression up until the Discussion (i.e. they may think that both the authors, and prior studies, believe that all the relevant dynamics are internal to M1). If possible, it might also be worth adding one sentence, somewhere early, to keep readers from falling into this hole (and then being stuck there till the Discussion digs them out).

      That is a good point: we have now edited the text after line 170 to make it clear that the underlying dynamics may not be confined to M1, and have referenced the later discussion there.

      (3) The authors make the point, starting on line 815, that transient (but strong) preparatory activity empirically occurs without a delay. They note that their model will do this but only if 'no delay' means 'no external delay'. For their model to prepare, there still needs to be an internal delay between when the first inputs arrive and when movement generating inputs arrive.

      This is not only a reasonable assumption, but is something that does indeed occur empirically. This can be seen in Figure 8c of Lara et al. Similarly, Kaufman et al. 2016 noted that "the sudden change in the CIS [the movement triggering event] occurred well after (~150 ms) the visual go cue... (~60 ms latency)" Behavioral experiments have also argued that internal movement-triggering events tend to be quite sluggish relative to the earliest they could be, causing RTs to be longer than they should be (Haith et al. Independence of Movement Preparation and Movement Initiation). Given this empirical support, the authors might wish to add a sentence indicating that the data tend to justify their assumption that the internal delay (separating the earliest response to sensory events from the events that actually cause movement to begin) never shrinks to zero.

      While on this topic, the Haith and Krakauer paper mentioned above good to cite because it does ponder the question of whether preparation is really necessary. By showing that they could get RTs to shrink considerably before behavior became inaccurate, they showed that people normally (when not pressured) use more preparation time than they really need. Given Lara et al, we know that preparation does always occur, but Haith and Krakauer were quite right that it can be very brief. This helped -- along with neural results -- change our view of preparation from something more cognitive that had to occur, so something more mechanical that was simply a good network strategy, which is indeed the authors current point. Working a discussion of this into the current paper may or may not make sense, but if there is a place where it is easy to cite, it would be appropriate.

      This is a nice suggestion, and we thank the reviewer for pointing us to the Haith and Krakauer paper. We have now added this reference and extended the paragraph following line 815 to briefly discuss the possible decoupling between preparation and movement initiation that is shown in the Haith paper, emphasizing how this may affect the interpretation of the internal delay and comparisons with behavioral experiments.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      (1) Peptides were synthesized with fluorescein isothiocyanate (FITC) and Tat tag, and then PEGylated with methoxy PEG Succinimidyl Succinate.

      I have two concerns about the peptide design. First, FTIC was intended "for monitoring" (line 129), but was never used in the manuscript. Second, PEGylation targets the two lysine sidechains on the Tat, which would alter its penetration property.

      (1) We conducted an analysis of the cellular trafficking of FITC-tagged peptides following their permeabilization into cells.

      Author response image 1.

      However, we did not include it in the main text because it is a basic result.

      (2) As can be seen in the figure above, after pegylation and permeabilization, the cells were stained with FITC. It appears that this does not affect the ability to penetrate into the cells.

      (2) "Superdex 200 increase 10/300 GL column" (line 437) was used to isolate mono/di PEGylated PDZ and separate them from the residual PEG and PDZ peptide. "m-PEG-succinimidyl succinate with an average molecular weight of 5000 Da" (lines 133 and 134).

      To my knowledge, the Superdex 200 increase 10/300 GL column is not suitable and is unlikely to produce traces shown in Figure 1B.

      As Superdex 200 increase 10/300 GL featrues a fractionation range of 10,000 to 600,000 Da, we used it to fractionate PEGylated products including DiPEGylated PDZ (approx. 15 kDa) and MonoPEGylated PDZ (approx. 10 kDa) from residuals (PDZ and PEG), demonstrating successful isolation of PEGylated products (Figure 1C). Considering the molecular weights of PDZ and PEG are approximately 4.1 kDa and and 5.0 kDa, respectively, the late eluting peaks from SEC were likely to represent a mixed absorbance of PDZ and PEG at 215 nm.

      However, as the reviewer pointed out, it could be unreasonable to annotate peaks representing PDZ and PEG, respectively, from mixed absorbance detected in a region (11-12 min) beyond the fractionation range.

      In our revised manuscript, therefore, multiple peaks in the late eluting volume (11-12 min) were labeled as 'Residuals' all together. As a reference, the revised figure 1B includes a chromatogram of pure PDZ-WT under the same analytic condition.

      Therefore, we changed Fig.1B to new results.

      (3) "the in vivo survival effect of LPS and PDZ co-administration was examined in mice. The pretreatment with WT PDZ peptide significantly increased survival and rescued compared to LPS only; these effects were not observed with the mut PDZ peptide (Figure 2a)." (lines 159-160).

      Fig 2a is the weight curve only. The data is missing in the manuscript.

      We added the survived curve into Fig. 2A.

      (4) Table 1, peptide treatment on ALT and AST appears minor.

      In mice treated with LPS, levels of ALT and AGT in the blood are elevated, but these levels decrease upon treatment with WT PDZ. However, the use of mut PDZ does not result in significant changes. Figure 3A shows inflammatory cells within the central vein, yet no substantial hepatotoxicity is observed during the 5-day treatment with LPS. Normally, the ranges of ALT and AGT in C57BL6 mice are 16 ~ 200 U/L and 46 ~ 221 U/L, respectively, according to UCLA Diagnostic Labs. Therefore, the values in all experiments fall within these normal ranges. In summary, a 5-day treatment with LPS induces inflammation in the liver but is too short a duration to induce hepatotoxicity, resulting in lower values.

      (5) MitoTraker Green FM shouldn't produce red images in Figure 6.

      We changed new results (GREEN one) into Figs 6A and B.

      (6) Figure 5. Comparison of mRNA expression in PDZ-treated BEAS-2B cells. Needs a clearer and more detailed description both in the main text and figure legend. The current version is very hard to read.

      We changed Fig. 5A to new one to understand much easier and added more detailed results and figure legend.

      Results Section in Figure 5:

      we performed RNA sequencing analysis. The results of RNA-seq analysis showed the expression pattern of 24,424 genes according to each comparison combination, of which the results showed the similarity of 51 genes overlapping in 4 gene categories and the similarity between each comparison combination (Figure 5a). As a result, compared to the control group, it was confirmed that LPS alone, WT PDZ+LPS, and mut PDZ+LPS were all upregulated above the average value in each gene, and when LPS treatment alone was compared with WT PDZ+LPS, it was confirmed that they were averaged or downregulated. When comparing LPS treatment alone and mut PDZ+LPS, it was confirmed that about half of the genes were upregulated. Regarding the similarity between comparison combinations, the comparison combination with LPS…

      Figure 5 Legend Section:

      Figure 5. Comparison of mRNA expression in PDZ-treated BEAS-2B cells.

      BEAS-2B cells were treated with wild-type PDZ or mutant PDZ peptide for 24 h and then incubated with LPS for 2 h, after which RNA sequencing analysis was performed. (a) The heat map shows the general regulation pattern of about 51 inflammation-related genes that are differentially expressed when WT PDZ and mut PDZ are treated with LPS, an inflammatory substance. All samples are RED = upregulated and BLUE = downregulated relative to the gene average. Each row represents a gene, and the columns represent the values of the control group treated only with LPS and the WT PDZ and mut PDZ groups with LPS. This was used by converting each log value into a fold change value. All genes were adjusted to have the same mean and standard deviation, the unit of change is the standard deviation from the mean, and the color value range of each row is the same. (b) Significant genes were selected using Gene category chat (Fold change value of 2.00 and normalized data (log2) value of 4.00). The above pie chart shows the distribution of four gene categories when comparing LPS versus control, WT PDZ+LPS/LPS, and mut PDZ+LPS/LPS. The bar graph below shows RED=upregulated, GREEN=downregulated for each gene category, and shows the number of upregulated and downregulated genes in each gene category. (c) The protein-protein interaction network constructed by the STRING database differentially displays commonly occurring genes by comparing WT PDZ+LPS/LPS, mut PDZ+LPS/LPS, and LPS. These nodes represent proteins associated with inflammation, and these connecting lines denote interactions between two proteins. Different line thicknesses indicate types of evidence used in predicting the associations.

      Reviewer #2:

      (1) In this paper, the authors demonstrated the anti-inflammatory effect of PDZ peptide by inhibition of NF-kB signaling. Are there any results on the PDZ peptide-binding proteins (directly or indirectly) that can regulate LPS-induced inflammatory signaling pathway? Elucidation of the PDZ peptide-its binding partner protein and regulatory mechanisms will strengthen the author's hypothesis about the anti-inflammatory effects of PDZ peptide.

      As mentioned in the Discussion section, we believe it is crucial to identify proteins that directly interact with PDZ and regulate it. This direct interaction can modulate intracellular signaling pathways, so we plan to express GST-PDZ and induce binding with cellular lysates, then characterize it using the LC-Mass/Mass method. We intend to further research these findings and submit them for publication.

      (2) The authors presented interesting insights into the therapeutic role of the PDZ motif peptide of ZO-1. PDZ domains are protein-protein interaction modules found in a variety of species. It has been thought that many cellular and biological functions, especially those involving signal transduction complexes, are affected by PDZ-mediated interactions. What is the rationale for selecting the core sequence that regulates inflammation among the PDZ motifs of ZO-1 shown in Figure 1A?

      The rationale for selecting the core sequence that regulates inflammation among the PDZ motifs of ZO-1, as shown in Figure 1A, is grounded in the specific roles these motifs play in signal transduction pathways that are crucial for inflammatory processes. PDZ domains are recognized for their ability to function as scaffolding proteins that organize signal transduction complexes, crucial for modulating cellular and biological functions. The chosen core sequence is particularly important because it is conserved across ZO-1, ZO-2, and ZO-3, indicating a fundamental role in maintaining cellular integrity and signaling pathways. This conservation suggests that the sequence’s involvement in inflammatory regulation is not only significant in ZO-1 but also reflects a broader biological function across the ZO family.

      (3) In Figure 3, the authors showed the representative images of IHC, please add the quantification analysis of Iba1 expression and PAS-positive cells using Image J or other software. To help understand the figure, an indication is needed to distinguish specifically stained cells (for example, a dotted line or an arrow).

      We added the semi-quantitative results into Figs. 3d,e,f.

      Result section: The specific physiological mechanism by which WT PDZ peptide decreases LPS-induced systemic inflammation in mice and the signal molecules involved remain unclear. These were confirmed by a semi-quantitative analysis of Iba-1 immunoreactivity and PAS staining in liver, kidney, and lung,respectively (Figures 4d, e, and f). To examine whether WT PDZ peptide can alter LPS-induced tissue damage in the kidney, cell toxicity assay was performed (Figure 3g). LPS induced cell damage in the kidney, however, WT PDZ peptide could significantly alleviate the toxicity, but mut PDZ peptide could not. Because cytotoxicity caused by LPS is frequently due to ROS production in the kidney (Su et al., 2023; Qiongyue et al., 2022), ROS production in the mitochondria was investigated in renal mitochondria cells harvested from kidney tissue (Figure 3h)......

      Figure legend section: Indicated scale bars were 20 μm. (d,e,f) Semi-quantitative analysis of each are positive for Iba-1 in liver and kidney, and positive cells of PAS in lung, respectively. (g) After the kidneys were harvested, tissue lysates were used for MTT assay. (h) After.....

      (4) In Figure 6G, H, the authors confirmed the change in expression of the M2 markers by PDZ peptide using the mouse monocyte cell line Raw264.7. It would be good to add an experiment on changes in M1 and M2 markers caused by PDZ peptides in human monocyte cells (for example, THP-1).

      We thank you for your comments. To determine whether PDZ peptide regulates M1/M2 polarization in human monocytes, we examined changes in M1 and M2 gene expression in THP-1 cells. As a result, wild-type PDZ significantly suppressed the expression of M1 marker genes (hlL-1β, hIL-6, hIL-8, hTNF-ɑ), while increasing the expression of M2 marker genes (hlL-4, hIL-10, hMRC-1). However, mutant PDZ did not affect M1/M2 polarization. These results suggest that PDZ peptide can suppress inflammation by regulating M1/M2 polarization of human monocyte cells. These results are for the reviewer's reference only and will not be included in the main content.

      Author response image 2.

      Minor point:

      The use of language is appropriate, with good writing skills. Nevertheless, a thorough proofread would eliminate small mistakes such as:

      • line 254, " mut PDZ+LPS/LPS (45.75%) " → " mut PDZ+LPS/LPS (47.75%) "

      • line 296, " Figure 6f " → " Figure 6h "

      We changed these points into the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study presents a novel pipeline for the large-scale genomic prediction of members of the non-ribosomal peptide group of pyoverdines based on a dataset from nearly 2000 Pseudomonas genomes. The advance presented in this study is largely based on solid evidence, although some main claims are only incompletely supported. This study on bacterial siderophores has broad theoretical and practical implications beyond a singular subfield.

      Thank you for the supportive and encouraging words. We appreciate the editor’s and reviewers’ careful and professional assessment of this manuscript. The reviewers’ scrutiny has helped us to improve the presentation and discussion of our work. We have now carefully revised the manuscript following their instructive suggestions and comments. Please find below our detailed responses (marked in blue) to each of the comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript introduces a bioinformatic pipeline designed to enhance the structure prediction of pyoverdines, revealing an extensive and previously overlooked diversity in siderophores and receptors. Utilizing a combination of feature sequence and phylogenetic approaches, the method aims to address the challenging task of predicting structures based on dispersed gene clusters, particularly relevant for pyoverdines.

      Predicting structures based on gene clusters is still challenging, especially pyoverdines as the gene clusters are often spread to different locations in the genome. An improved method would indeed be highly useful, and the diversity of pyoverdine gene clusters and receptors identified is impressive.

      However, so far the method basically aligns the structural genes and domains involved in pyoverdine biosynthesis and then predicts A domain specificity to predict the encoded compounds. Both methods are not particularly new as they are included in other tools such as PRISM (10.1093/nar/gkx320) or Sandpuma (https://doi.org/10.1093/bioinformatics/btx400) among others. The study claims superiority in A domain prediction compared to existing tools, yet the support is currently limited, relying on a comparison solely with AntiSMASH. A more extensive and systematic comparison with other tools is needed.  

      Thanks for pointing this out. In the revised manuscript, we have included a comprehensive comparative analysis, in which we compared our pipeline to six different commonly used methods, including NP.searcher, PRISM4, AdenPredictor, SeMPI2, SANDPUMA, antiSMASH5 (see Supplementary_table 6 for details, and lines 281-286). These approaches either consist of a single specific algorithm or integrate several methods. Our approach performs best (see table below), demonstrating a clear improvement over previous tool. The improvements are due to several methodological differences inherent to our approach. Additionally, while exploring existing prediction tools, we found that some had not been maintained for years. For instance, we were unable to access NRPSsp (www.nrpssp.com) and NRPSpredictor2 (http://nrps.informatik.uni-tuebingen.de/). Below, we briefly explain these differences, particularly in relation to PRISM and SANDPUMA, as highlighted by the reviewer. 

      Author response table 1.

      PRISM annotates biosynthetic gene clusters (BGC) and reconstructs the linear structures of NRPS synthetases, with this function depending on proper annotations of open reading frames. This pipeline can have difficulties in assembling the linear structure into a final product. In our approach, we found that the annotations of NRPS gene are frequently truncated because of sequencing errors and annotation issues. Our method fixes this problem through rescanning all possible reading frames of the BGC to rebuild complete pyoverdine synthetase genes. 

      Sandpum and our approach are based on similar ideas (using the prediCAT algorithm) to predict A domain substrates, namely by using the closest reference A domain annotated. However, our method uses a self-adaptive feature extraction step to reduce the co-founding influence of phylogeny. This small adjustment significantly improves the performance of our approach and even works well for small training sets (101 experimentally validated A domains with our approach as opposed to 494 A domains used by Sandpuma from MIBiG).

      Additionally, in contradiction to the authors' claims, the method's applicability seems constrained to well-known and widely distributed gene clusters. The absence of predictions for new amino acids raises concerns about its generalizability to NRPS beyond the studied cases.

      We thank the reviewers for this comment. We acknowledge that our method cannot directly predict new amino acids. Nevertheless, for several reasons we believe that our approach is not constrained and can be widely applied in the future.

      First, our method can identify A domains that select new unknown amino acid substrates. In fact, three of the four unresolved cases in our experimental verification analysis (Fig. 3d) represent new amino acids. Obviously, experimental verification is required to characterize the unknown substrate. Once verified, the new A domains and their substrates can expand the reference dataset, allowing targeted improvement of our phylogeny-focused prediction technique. We now discuss this aspect in lines 634-645.

      Second, despite that the overall substrate diversity in NRPS is high across the microbial kingdom, our analysis suggests that the number of amino acids used for a specific group of secondary metabolites quickly reaches a saturation point. The discovery rate of new amino acids was 1.7% for our experimental Pseudomonas data set (Fig. 3d). The discovery rate of new amino acids was even 0.0 % for the Burkholderiales data set. This suggests that as the database expands, the discovery rate of novel amino acid substrates is expected to drop rapidly.

      Third, we acknowledge that the inability to predict the substrates of unknown domains is a common limitation among all knowledge-guided learning algorithms, including ours. However, we have made significant improvements in prediction accuracy. As the database grows, we expect the rate of unknown substrates to decrease, and the prediction accuracy to increase.

      The manuscript lacks clarity on how the alignment of structural genes operates when dealing with multiple NRPS gene clusters on different genome contigs. How would the alignment of each BGC work?

      We thank the reviewers for this comment. The pyoverdine molecules consist of a conserved fluorescent chromophore (Flu) and a peptide chain (Pep), both synthesized by NRPS enzymes. In most instances (over 90%), Flu and Pep are produced by two separate biosynthetic gene clusters (BGCs). In these cases, we merge the two BGCs by positioning Flu at the head and Pep at the tail. For the remaining less than 10%, there are two scenarios: 1. Flu and Pep are located on the same BGC, which eliminates any issues with BGC alignment. 2. In very rare cases, Flu and Pep are synthesized by three BGCs. Here, Flu is still synthesized by one BGC at the head, while Pep is produced by two BGCs. We put the BGC containing the Thioesterase (TE) domain as the tail and the BGC not containing the TE domain in the middle.

      (see lines 165-169).

      Another critical concern is that a main challenge in NRPS structure prediction is not the backbone prediction but rather the prediction of tailoring reactions, which is not addressed in the manuscript at all, and this limitation extensively restricts the applicability of the method.

      While we thank the reviewer for this comment, we only partly agree with it. Peptide backbone predictions are still a significant challenge. This challenge is clearly visible in our new analysis comparing prediction accuracies of different pipelines, such as antiSMASH5, PRISM4, AdenPredictor, SeMPI2, NP.searcher, Sandpuma. Unresolved and wrong substrate predictions are still common, highlighting the importance of our contribution in developing a new approach with improved high accuracy. 

      However, we agree with the reviewer that our current algorithm does not predict tailoring reactions (now discussed on lines 680-685). Although tailoring reactions are important for predicting the final NRPS product structure, none of the other existing pipelines address this issue either, and it remains a challenge for future work. For our study, it is important to note that the specificity of pyoverdines is primarily determined by the backbone composition, whereas tailoring reactions seem to play a minor role.

      The manuscript presents a potentially highly useful bioinformatic pipeline for pyoverdine structure prediction, showcasing a commendable exploration of siderophore diversity. However, some of the claims made remain unsubstantiated. Overall, while the study holds promise, further validation and refinement are required to fulfill its potential impact on the field of bioinformatic structure prediction.

      Thank you for the supportive and encouraging words. We deeply appreciate your constructive comments and suggestions. 

      Reviewer #2 (Public Review):

      Pyoverdines, siderophores produced by many Pseudomonads, are one of the most diverse groups of specialized metabolites and are frequently used as model systems. Thousands of Pseudomonas genomes are available, but large-scale analyses of pyoverdines are hampered by the biosynthetic gene clusters (BGCs) being spread across multiple genomic loci and existing tools' inability to accurately predict amino acid substrates of the biosynthetic adenylation (A) domains. The authors present a bioinformatics pipeline that identifies pyoverdine BGCs and predicts the A domain substrates with high accuracy. They tackled a second challenging problem by developing an algorithm to differentiate between outer membrane receptor selectivity for pyoverdines versus other siderophores and substrates. The authors applied their dataset to thousands of Pseudomonas strains, producing the first comprehensive overview of pyoverdines and their receptors and predicting many new structural variants.

      The A domain substrate prediction is impressive, including the correction of entries in the MIBiG database. Their high accuracy came from a relatively small training dataset of A domains from 13 pyoverdine BGCs. The authors acknowledge that this small dataset does not include all substrates, and correctly point out that new sequence/structure pairs can be added to the training set to refine the prediction algorithm. 

      The authors could have been more comprehensive in finding their training set data. For instance, the authors claim that histidine "had not been previously documented in pyoverdines", but the sequenced strain P. entomophila L48, incorporates His (10.1007/s10534-009-9247-y). 

      Thank you for highlighting this issue. We agree that stating histidine has not been reported before in pyoverdine was incorrect. We have reviewed the full text and made the necessary corrections.

      The primary reason for excluding the sequenced strains P. syringae 1448a (10.1186/14712180-11-218) and P. entomophila L48 (10.1007/s10534-009-9247-y) from the training set is that the pyoverdine structures of these strains were not determined solely through experimental methods. In these works, the pyoverdine structures were predicted based on the synthetic gene sequence using bioinformatical analysis, followed by structural analysis experiments based on this predicted structure. We found that pre-prediction probably has introduced biases into downstream analyses. Specifically, in the case of Pseudomonas entomophila L48, we discovered inaccuracies in the annotation of certain domains (see figures below). For example, the third A domain of the peptide chain in P. entomophila L48 pyoverdine was initially annotated with Dab specificity. However, upon closer examination, it appears to differ significantly from other Dab references (top) or Dab from our experimentally validated (right) domains (left panel in the figure below). By analyzing the interface (I) domain (10.1073/pnas.1903161116) in its predicted site, we suggested that it should actually recognize OHHis. The OHAsp domain of P. entomophila L48 reported in the paper is actually close in sequence similarity to the OHAsp domain (left panel in the figure below), while the Ala domain reported is more similar to the Ser domain (right panel in the figure below). For these reasons, we did not include this supervised pyoverdine structure analysis strain in the training set data.

      Author response image 1.

      The workflow cannot differentiate between different variants of Asp and OHOrn, and it's not clear if this is a limitation of the workflow, the training data, or both. 

      Thanks for pointing this out. It is generally challenging to differentiate between variants of the same amino acid (for all the algorithms existing to date). In this sense, it is a limitation of our but also of all other workflows. Nonetheless, we wish to stress that we observed feature sequence divergence (using the A motif4-5 region), which helped us to separate some (but not all) of the Asp and Orn variants. For example, separations between Asp-variants are distinct (left panel in the figure below). To be on the conservative side, we only differentiated between OHAsp and Asp for our predictions, but also differentiation between DOHAsp and OHAsp would be possible. In the case of Orn-variants, there was a clear separation between Orn and the OHOrn variants (right panel). In contrast, it was difficult to differentiate between the subgroups of OHOrn variants. We believe that no A domain prediction tool will be able to solve this issue. Instead, it would be important to include information on substrate-modifying enzymes in future approaches.

      Author response image 2.

      The prediction workflow holds up well in Burkholderiales A domains, however, they fail to mention in the main text that they achieved these numbers by adding more A domains to their training set.

      We thank the reviewers for this comment. We apologize for not having mentioned the training data set in the main text, while we described it in detail in the methods section (lines 714-732). We now provided more details on the analysis procedure in the main text (lines 307313). Important to note is that we did not add more A domains to the training data set but built up a new independent data set for Burkholderiales. The aim was to mirror the analysis we performed for pyoverdines with a completely new data set, featuring 124 A domains for training and 178 A domains as test set.

      To validate their predictions, they elucidated structures of several new pyoverdines, and their predictions performed well. However, the authors did not include their MS/MS data, making it impossible to validate their structures. In general, the biggest limitation of the submitted manuscript is the near-empty methods section, which does not include any experimental details for the 20 strains or details of the annotation pipeline (such as "Phydist" and "Syndist"). The source code also does not contain the requisite information to replicate the results or re-use the pipeline, such as the antiSMASH version and required flags. That said, skimming through the source code and data (kindly provided upon request) suggests that the workflow itself is sound and a clear improvement over existing tools for pyoverdine BGC annotation.

      Thank you for highlighting these issues. We agree that the methods section is short. This is because the entire paper is a step-by-step methodological introduction to our pipeline. We have now carefully revised the main text to add the information requested by the reviewer. Moreover, we have included a supplementary file with the MS/MS data of the experimentally analyzed pyoverdine structures. Finally, we further include a link to a one-click online notebook that can be used to replicate the annotation and substrate prediction results See: https://drive.google.com/drive/folders/1JsfyPUGDTFo8BDDZk8JLSvKry8emzMhr?usp=drive_ link , following a more detail explanation on code.

      Predicting outer membrane receptor specificity is likewise a challenging problem and the authors have made a promising achievement by finding specific gene regions that differentiate the pyoverdine receptor FpvA from FpvB and other receptor families. Their predictions were not tested experimentally, but the finding that only predicted FpvA receptors were proximate to the biosynthesis genes lends credence to the predictive power of the workflow. The authors find predicted pyoverdine receptors across an impressive 468 genera, an exciting finding for expanding the role of pyoverdines as public goods beyond Pseudomonas. However, whether or not these receptors can recognize pyoverdines (and if so, which structures!) remains to be investigated.

      Thank you for the supportive and encouraging words. The bioinformatic analysis and experimental testing of pyoverdine-receptor matching is complicated and it is not part of this paper. We treated it in a separate manuscript in which we developed an experimentally verified co-evolution algorithm that matches pyoverdines to receptors. With this algorithm, we can identify self-receptors (i.e. receptors used to take up the self-produced pyoverdine), and therefore establish pyoverdine sharing and interaction networks across strains in communities.

      Please see DOI:10.1101/2023.11.05.565711 for details.

      In all, the authors have assembled a rich dataset that will enable large-scale comparative genomic analyses. This dataset could be used by a variety of researchers, including those studying natural product evolution, public good eco/evo dynamics, and NRPS engineering.

      Thank you for the supportive and encouraging words. We are grateful for the reviewers’ instructive suggestions and comments.

      Reviewer #3 (Public Review):

      Summary:

      Secondary metabolites are produced by numerous microorganisms and have important ecological functions. A major problem is that neither the function of a secondary metabolite enzyme nor the resulting metabolite can be precisely predicted from gene sequence data.

      In the current paper, the authors addressed this highly relevant question.

      The authors developed a bioinformatic pipeline to reconstruct the complete secondary metabolism pathway of pyoverdines, a class of iron-scavenging siderophores produced by Pseudomonas spp. These secondary metabolites are biosynthesized by a series of nonribosomal peptide synthetases and require a specific receptor (FpvA) for uptake. The authors combined knowledge-guided learning with phylogeny-based methods to predict with high accuracy encoding NRPSs, substrate specificity of A domains, pyoverdine derivatives, and receptors. After validation, the authors tested their pipeline with sequence data from 1664 phylogenetically distinct Pseudomonas strains and were able to determine 18,292 enzymatic A domains involved in pyoverdine synthesis, reliably predicted 97.8% of their substrates, identified 188 different pyoverdine molecule structures and 4547 FpvA receptor variants belonging to 94 distinct groups. All the results and predictions were clearly superior to predictions that are based on antiSMASH. Novel pyoverdine structures were elucidated experimentally by UHPLC-HR-MS/MS.

      To assess the extendibility of the pipeline, the authors chose Burkholderiales as a test case which led to the results that the pipeline consistently maintains high prediction accuracy within Burkholderiales of 83% which was higher than for antiSMASH (67%).

      Together, the authors concluded that supervised learning based on a few known compounds produced by species from the same genus probably outperforms generalized prediction algorithms trained on many products from a diverse set of microbes for NRPS substrate predictions. As a result, they also show that both pyoverdine and receptor diversity have been vastly underestimated.

      Strengths:

      The authors developed a very useful bioinformatic pipeline with high accuracy for secondary metabolites, at least for pyoverdines. The pipelines have several advantages compared to existing pipelines like the extensively used antiSMASH program, e.g. it can be applied to draft genomes, shows reduced erroneous gene predictions, etc. The accuracy was impressively demonstrated by the discovery of novel pyoverdines whose structures were experimentally substantiated by UHPLC-HR-MS/MS.

      The manuscript is very well written, and the data and the description of the generation of pipelines are easy to follow.

      Weaknesses:

      The only major comment I have is the uncertainty of whether the pipeline can be applied to more complex non-ribosomal peptides. In the current study, the authors only applied their pipeline to a very narrow field, i.e., pyoverdines of Pseudomonas and Burkholderia strains.

      Thanks for your positive and encouraging comment. Regarding your only major comment, we think that the design concept of our pipeline has the potential to be applied to more complex non-ribosomal peptides. Currently, our method is tailored to accurately predict the structural composition of the Pseudomonas siderophore pyoverdine (see also response 3). A key point emphasized in our article is the importance of considering phylogeny in developing substrate prediction algorithms for A domains. Currently, the main challenge in advancing these algorithms is the limited availability of data on A domains and their corresponding substrates. However, with the future accumulation of more reference data, we are confident that the design principles of our method will enable precise predictions of the structural compositions of all products synthesized by non-ribosomal peptide synthetases (see our discussions in lines 634-

      645). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I believe that the manuscript would benefit from focusing solely on the task of improving pyoverdine predictions. This aspect alone is significant, and robustly supporting this claim would strengthen the manuscript. The diversity analysis provided is valuable and would undoubtedly benefit the scientific community. However, additional systematic comparisons with other methods are necessary. Furthermore, clarification of certain terms, such as 'featurebased' (e.g., whether it refers to NRPS domains or CDS), would enhance clarity.

      Thank you for the supportive and encouraging words. We followed the reviewer’s suggestion and now provide the requested method comparison, see also response 2 for details. Furthermore, we have carefully checked the main text to clarify terms whenever needed. Specifically, we now define the terms “feature sequence” and “feature sequence distance” in lines 227-229.  

      Additionally, several minor points could be improved upon:

      In line 85, clarification is needed on how pyoverdine genes were identified.

      Thank you for your thorough review. In the introduction section, we provided a brief overview of our work, while the detailed methodology is outlined in the results section on lines 160-174.

      In line 382, it would be helpful to know the source of the sequences.

      We agree and have now carefully revised the manuscript following your suggestions (lines 403-405).

      Line 392 could be explained more clearly. Does it mean that the authors used an hmm search to search pHMMs against each reference sequence?

      Thanks for your comment. Yes, we used an hmm search to search pHMMs against each reference sequence. We have now revised the manuscript to improve explanations (lines 413-418).

      Reviewer #2 (Recommendations For The Authors):

      The authors state they "elucidated the chemical structure of the 20 pyoverdines using culturebased methods combined with UHPLC-HR-MS/MS", so I was alarmed to see that KR and LB already published several of those structures in the cited paper. I hope that this "double dipping" will be fixed in a revision process.

      Thank you for pointing this out. We agree that we have not explained clearly enough what steps were conducted in this study and which data were used from a previous paper (https://doi.org/10.1007/s00216-022-03907-w). The genomes of the 20 strains used for the verification analysis (Fig. 3d) were sequenced as part of this study (access code now provided). 14 out of the 20 pyoverdine structures were elucidated with UHPLC-HR-MS/MS in this study. For 6 out of the 20 pyoverdines, we had structural information already at hand from the previous paper. We have now clarified these details in our manuscript (lines 276-280). 

      Thank you for providing the source code and data, and I hope that the final non-redundant dataset will be uploaded to Zenodo or another repository. Please deposit the 20 newlysequenced genomes to GenBank or another public repository. Please also show the UHPLC-

      HR-MS/MS data, preferably in the form of raw data uploaded to GNPS.

      We have followed the reviewer’s advice and deposited our data:

      - The sequences of the 20 newly sequenced strains are available on ENA accession PRJEB76792.

      - The MS/MS plots of the 14 newly analyzed pyoverdines are shown in the Supplementary Materials.

      - We provide a one-click online notebook to allow readers to replicate the pyoverdine cluster annotation and substrate prediction of the 20 experimentally analyzed strains.

      I suggest adding "at least" or a similar qualifier when the 73 variants are mentioned unless the literature search was truly exhaustive. What were the criteria for inclusion of the 13 strains in Table S2? For instance, sequenced strains P. syringae 1448a (10.1186/1471-2180-11-218) and P. entomophila L48 (10.1007/s10534-009-9247-y) were not included.

      Thank you for your comment. We have now carefully revised the manuscript following your suggestions (lines 291-295). Regarding the criteria for including the 13 strains in Table S2, we aimed to select strains with the high credibility for inclusion in the training set data. The primary reason for excluding the two strains from the training set is that their siderophore structures were analyzed through supervised experiments. We wanted to avoid any form of biases that bioinformatic pre-predictions could introduce to downstream analyses (see Response 13 for details).

      OHAsp in pyoverdines has been reported to arise from hydroxylation of Asp after it's already been activated by the A domain (10.1073/pnas.1903161116). Was there a clear difference between A domains that lead to Asp and OHAsp? Conversely, acetylation and formylation of OHOrn occur before adenylation. Can your workflow be used to differentiate cOHOrn, fOHOrn, and AcOHOrn, which are currently difficult to predict through genome mining?

      Thank you for these considerations. We treated these aspects in our response 8.  

      Throughout, define non-proteinogenic AA substrate abbreviations (ex: Rsc, Dab).

      Revised as per suggestion (lines 329-333).

      Additional line comments:

      189: Mention PhyloPhlAn in the main text.

      Revised as per suggestion (lines 189).

      191: Define these filtering/selection criteria.

      Thanks for your comment, we have added the criteria in the main text (line 196 and line 198). 

      309, 620: An A domain presumably loading histidine is present in sequenced strain P. entomophila L48 (10.1007/s10534-009-9247-y). Please also clarify that Val has previously been seen in a pyoverdine (it is in Table S1) albeit not sequenced.

      We have clarified these aspects as per suggestion (lines 314-315 and line 630).

      310: The pipeline can "highlight" new substrates, but not identify them.

      Revised as per suggestion (line 295).

      354: Please clarify "13 amino acid substrates form the core of all the 188 pyoverdine structures", considering that 279 A domain substrates couldn't be predicted.

      Thanks for your comments. We have now clarified “our analysis found that 13 amino acids form the main structural substrates of all the 188 pyoverdine structures.” (lines

      360-363)

      630: "discovered" implies that there is experimental evidence. I suggest something like "here we predicted 151 putatively new variants".

      Revised as per suggestion (line 648).

      Reviewer #3 (Recommendations For The Authors):

      Weakness:

      The only major comment I have is the uncertainty of whether the pipeline can be applied to more complex non-ribosomal peptides. In the current study, the authors only applied their pipeline to a very narrow field, i.e., pyoverdines of Pseudomonas and Burkholderia strains

      Thanks for your comment. Please see our Responses 3+13 above, where we treat this concern in detail. Moreover, we discussed the possibility of extension to other groups of secondary metabolites in our discussion. We believe that we deliver a balanced view on the applicability of our approach and the next steps to be taken.  

      Please comment on this aspect.

      Minor:

      (1)  When you speak about "synthesis" it is rather biosynthesis. Synthesis is chemical synthesis.

      Please replace all instances of the word synthesis with biosynthesis.

      Revised as per suggestion.

      (2)  Line 188: synthetase is rather synthetases

      Revised as per suggestion (line 191).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Point-by-point reply in response to the Reviewer’s comments

      Reviewer #1

      Public review:

      [1] (a) Given that only a fraction of the FAPs express BDNF after injury, the authors need to demonstrate the specificity of the Prrx1-Cre for FAPs. This is particularly important because muscle stem cell also express GDNF receptors (Fig. 3C & D) and myogenic progenitors/satellite cells produce BDNF after nerve injury (Griesbeck et al., 1995 (PMID 8531223); Omura et al., 2005 (PMID 16221288)). (b) Moreover, as the authors point out, there are multipotent mesenchymal precursor cells in the nerve that migrate into the surrounding tissue following nerve injury and contribute to regeneration (Carr et al, PMID 30503141). Therefore, there are multiple possible sources of BDNF, highlighting the need to clearly demonstrate that FAP-derived BDNF is essential.

      - (a) As the Reviewer noted, both GDNF receptor expression and increased BDNF expression in response to nerve injury are detectable in both FAPs and muscle stem cells (MuSCs). Therefore, we agree with the Reviewer that demonstrating the specificity of Prrx1-Cre in FAPs is crucial to support our claim. In our previous publication (Kim et al., 2022), using Prrx1-Cre; Rosa-eYFP mice, we showed that while most of the CD31-CD45-Vcam1-Sca1+ FAPs are eYFP+, CD31-CD45-Vcam1+Sca1- MuSCs do not express eYFP (Liu et al., 2015; Kim et al., 2022) (Attached Figure 1). Additionally, genomic DNA PCR using mononuclear cells sorted from our Prrx1Cre; Bdnffl/fl mice showed that DNA recombination in the floxed Bdnf gene could only be detected in FAPs and CD31-CD45-Vcam1-Sca1- cells, but not in MuSCs (Author response image 2). This is consistent with a previous report that showed Prrx1-Cre activity in FAPs, pericytes, vascular smooth muscle cells (vSMCs) and tenocytes (Leinroth et al.,

      2022), where pericytes, vSMCs and tenocytes are included the CD31-CD45-Vcam1Sca1- population (Giordani et al., 2019). Together, these results demonstrate that while Prrx1-Cre is active in FAPs, it is absent in MuSCs.

      Author response image 1.

      Expression of eYFP in muscle-resident, lineage-negative, live mononuclear cells isolated from Prrx1Cre;RosaeYFP mice. Supplemental Figure 3A from Kim et al., 2022. Lin-: lineage-negative (CD31-CD45-); Neg.: Vcam1-Sca1-.

      Author response image 2.

      Recombination of the floxed Bdnf gene in the mononuclear cells sorted from muscles of Prrx1Cre; Bdnffl/fl or Bdnffl/fl mice. Genotypes and cell types sampled for each lane is specified. P4, P5, and P6 indicate primers used for each PCR. Lin+: lineage(CD31/CD45)-positive; DN: CD31-CD45-Vcam1-Sca1-.

      - (b) We appreciate and agree with the Reviewer’s comment that additional experiments are needed to confirm that FAP-derived BDNF is indeed essential for nerve regeneration, considering other potential cellular sources of BDNF, such as nerve-resident mesenchymal precursor cells. One possible experiment that could demonstrate the requirement of FAP-derived BDNF in nerve regeneration would be the transplantation of wild-type FAPs into our Prrx1Cre; Bdnf fl/fl mice and to see if the delay in nerve regeneration and remyelination is recovered, making the process similar to that in control mice. Unfortunately, since the genetic background of our Prrx1Cre; Bdnffl/fl mice is a mixture of B6, 129S4, and BALB/c, immune rejection of the transplanted cells may occur, which makes the experiment technically difficult. Another experimental approach could involve the use of FAP-specific Cre mouse line, as we have mentioned in the Discussion of our original manuscript. However, such a line does not yet exist due to the lack of a marker gene that is expressed specifically in FAPs, but not in nerve-resident mesenchymal precursor cells. Overcoming such technical challenges and demonstrating the requirement of FAP-derived BDNF in nerve regeneration would significantly strengthen our report, though we regret that these methods are currently unavailable.

      [2] Similarly, the authors should provide some evidence that BDNF protein is produced by FAPs. All of their data for BDNF expression is based on mRNA expression and that appears to only be increased in a small subset of FAPs. Perhaps an immunostaining could be done to demonstrate up-regulation of BDNF in FAPs after injury.

      - We appreciate the Reviewer’s constructive comment. To demonstrate that BDNF protein is produced by FAPs upon nerve injury, we performed western blot analysis. FAPs were isolated from either sciatic nerve crush injury-affected muscles at 7 days post injury (dpi) or from the contralateral, uninjured muscles, and protein samples were prepared for SDS-PAGE and western blot using anti-BDNF, anti-PDGFRα and antiGAPDH antibodies. As a result, while both nerve injury-affected and uninjured musclederived FAPs expressed PDGFRα, the mature from of BDNF protein was only detected in nerve injury-affected FAPs, showing that BDNF is indeed expressed in FAPs at the protein level after injury. We have added this new result as Figure 4F in the New Figure 4 with the experimental scheme as New Figure 4—figure supplement 1, and revised the Results section (lines 364-374) and the Materials and Methods section (lines 687-705) in our manuscript to include the new results in detail.

      [3] The suggestion that Schwann cell-derived GDNF is responsible for upregulation of BDNF in the FAPs is indirect, based largely on the data showing that injection of GDNF into the muscle is sufficient to up-regulate BDNF (Fig. 4F & G). However, to more directly connect the 2 observations in a causal way, the authors should inject a Ret/GDNF antagonist, such as a Ret-Fc construct, then measure the BDNF levels.

      - We appreciate the Reviewer’s constructive comment, and we agree that testing the necessity of GDNF/RET signaling in BDNF upregulation is crucial to link the expression of the two neurotrophic factors in a causal way. As a means to antagonize GDNF/RET signaling, we injected anti-GDNF antibodies into the tibialis anterior and gastrocnemius muscles following sciatic nerve crush injury to block the activity of intramuscular GDNF protein. As a result, although the differences were not statistically significant, we observed a tendancy towards decreased Bdnf mRNA expression upon anti-GDNF injection compared to IgG controls. We have added this new result as New Figure 4—figure supplement 2, and revised our manuscript to include the details in both the Results section (lines 381-390) and the Materials and Methods section (lines 611-616). We have also changed the title of New Figure 4 (line 332) to encompass the new results. We are aware that further experiments that may involve increasing the number of animals tested, increasing the antibody injection dosage or frequency, or implementation of genetic models such as Plp1CreER; Gdnffl/fl should be carried out to validate our hypothesis with statistical significance. Unfortunately, due to limited time, resources, and research funds, we were unable to perform such additional experiments. We hope that the Reviewer understands these limitations.

      [4] (a) In assessing the regeneration after nerve crush, the authors focus on remyelination, for example, assessing CMAP and g-ratios. However, they should also quantify axon regeneration, which can be done distal to the crush injury at earlier time points, before the 6 weeks scored in their study. Evaluating axon regeneration, which occurs prior to remyelination, would be especially useful because BDNF can act on both Schwann cells, to promote myelination, and axons, enhancing survival and growth. (b) They could also evaluate the stability of the neuromuscular junctions, particularly if a denervation was done with the conditional knock outs, although that may be a bit beyond the scope of this study.

      - (a) As the Reviewer mentioned, BDNF is known to act on both Schwann cells and axons, where it promotes myelination and axonal growth, respectively (Oudega and

      Hagg, 1998; Zhang et al., 2000; Chan et al., 2001; Xiao et al., 2009; English et al.,

      2013). We fully agree with the Reviewer’s comment that quantification of axon regeneration, which could be achieved through immunostaining of the distal part of the sciatic nerve at earlier time points after injury, would shed light on whether FAPderived BDNF can also contribute to axon regeneration in addition to remyelination. Unfortunately, we could not perform such additional experiments within the limited time frame, since preparing enough numbers of control and conditional knockout mice that match the age groups used in this study (3-4 months old), followed by waiting for additional 2-4 weeks after nerve crush injury for sample collection, and subsequent immunostaining for quantification could take almost 6 months in total. We hope that the Reviewer understands this limitation.

      - (b) We appreciate the Reviewer’s constructive comment. Although the number of animals used for neuromuscular junction (NMJ) analyses was not sufficient, we had briefly examined the structure of NMJs at 4 weeks post nerve crush injury in control (Ctrl) and conditional knockout (cKO) mice as a preliminary experiment. As a result, no significant differences were observed between Ctrl and cKO mice in terms of NMJ morphology and innervation (Author response image 3). 

      Author response image 3.

      Structures of neuromuscular junctions from Ctrl vs cKO mice at 4 weeks post nerve crush injury. Whole-mount immunostaining was done using the exterior digitorum longus muscles that were affected by sciatic nerve crush injury. Samples were stained with α-bungarotoxin (green), neurofilament (red), and synaptophysin (blue). Scale bar: 50 μm. 

      Going back to part (a) of this Reviewer’s comment, considering the data presented in Author response image 3, where innervation of axons into acetylcholine receptor clusters was not significantly different between Ctrl versus cKO mice, FAP-derived BDNF may not be critical for the axonal growth upon nerve injury. Although we acknowledge that additional experiments are required to draw a meaningful conclusion on this point, we could not perform such additional experiments due to insufficient time and resources.

      We hope that the Reviewer understands our limitation.

      Recommendations for the authors:

      [1] In citing the ability of BDNF to promote Schwann cell myelination the authors should include Chan et al., 2001 (PMID 11717413) in addition to the Zhang et al, 2000 and Xiao et al, 2009 references.

      - We apologize for missing out the reference mentioned by the Reviewer. We have added the suggested reference in our revised manuscript (lines 395, 425, and 517).

      Reviewer #2

      Public review:

      [1] Although, I find the data the authors generated enough for their claims. I do see them as relatively poor, and (a) a complementary analysis of protein expression would strengthen the paper through immunostaining of the different genes mentioned for FAPs and Schwann cells. The model is entirely supported by measuring mRNA levels and negative regulation of gene expression in specific cells. Additionally, (b) what happens to the structure of the neuromuscular junction after regeneration when GDNF or BDNF expression is reduced? (c) The determination of decreasing levels of FAPs BDNF mRNA during aging is interesting; is the gain of BDNF expression in FAPs reverting the phenotype?

      - (a) We appreciate and agree with the Reviewer’s comment that validation of BDNF protein expression in FAPs and GDNF protein expression in Schwann cells upon nerve injury would strengthen this paper. Regarding GDNF protein expression in Schwann cells upon nerve injury, it has already been demonstrated by previous studies (Höke et al., 2002; Xu et al., 2013). For BDNF protein expression in FAPs upon nerve injury, we performed western blot analysis for validation, as mentioned in the response to Reviewer #1 Public review [2]. The results showed that while the mature form of BDNF protein could not be readily detected in FAPs isolated from uninjured muscles, it could be detected in FAPs isolated from sciatic nerve crush injury-affected muscles at 7 days post injury. We have added the new result as Figure 4F in the New Figure 4 with the experimental scheme as New Figure 4—figure supplement 1, and revised the Results section (lines 364-374) and the Materials and Methods section (lines 687-705) in our manuscript to include the new results in detail.

      - (b) Though the data is preliminary, we examined the structures of neuromuscular junctions (NMJs) from control and Prrx1Cre; Bdnf fl/fl mice at 4 weeks post injury in the exterior digitorum longus muscles, as mentioned in the response to Reviewer #1 Publilc review [4](b). As a result, we could not identify significant differences between control versus Prrx1Cre; Bdnf fl/fl mice, where BDNF expression is reduced specifically in Prrx1-expressing cells, including FAPs (Attached Figure 3). Since other cellular sources of BDNF, such as Schwann cells, exist, regeneration of the NMJs may not have been as significantly affected as remyelination in our Prrx1Cre; Bdnf fl/fl mice. However, further experiments with a sufficient number of mice and more observation time points are required to statistically validate this hypothesis in detail. Unfortunately, preparing samples for such additional analyses would take more than four months, as we need to produce sufficient numbers of control and Prrx1Cre; Bdnf fl/fl mice that match the age groups used in this study. We hope that the Reviewer understands our limitation.

      Regarding analyzing NMJ structures after regeneration affected by reduced GDNF levels, using genetic models such as Plp1CreER; Gdnffl/fl mice would be appropriate, as we have used the Prrx1Cre; Bdnffl/fl mice in this study to reduce BDNF levels produced by FAPs. Unfortunately, we do not have the Gdnffl mice, and obtaining these mice to produce Plp1CreER; Gdnffl/fl mice and performing the additional experiment would take too much time for this current revision. In a further study, we will try to perform the additional experiment by obtaining the required mouse line. We hope that the Reviewer understands our limitation.

      - (c) We appreciate the Reviewer for highlighting this point. In this paper, we have shown that BDNF expression upon nerve injury is decreased in aged FAPs compared to young adult FAPs, and suggested that this may be one of the causes of the delayed nerve regeneration phenotype in aged mice. Previously, it has been reported that while intramuscular injection of BDNF accelerates nerve regeneration, intramuscular injection of anti-BDNF antibodies delays the regeneration process (Zheng et al., 2016). This implies that intramuscular levels of active BDNF can significantly influence the speed of nerve regeneration. Therefore, the gain of BDNF expression in aged FAPs may contribute to reversing the delayed nerve regeneration phenotype in aged mice, since it would result in additional supply of active, intramuscular BDNF, which has previously been shown to accelerate nerve regeneration. Though experimental validation is required to support such claim, we could not obtain sufficient numbers of aged mice within the limited time frame. We hope that the Reviewer understands our limitation.

      Recommendations for the authors:

      [1] The authors should include the experimental design and several drawings in the leading figures indicating, for example, how remyelination after injury was quantified and how the response of regenerated sciatic nerve to a depolarizing stimulus was studied.

      - We apologize for any confusion caused by insufficient information provided in the leading figures. Unfortunately, due to limited space, we could not add experimental designs or drawings in the leading figures. Instead, to do our best to comply with the

      Reviewer’s comment, we have revised the figure legends in the leading figures so that the experimental designs or diagrams can be referred to in the figure supplements.

      We hope that the Reviewer understands this limitation.

      Reviewer #3

      Public review:

      [1] In Fig. 1 and 2 authors provide data on scRNA seq and this is important information reporting the finding of RET and GFRa1 transcripts in the subpopulation of FAP cells. However, authors provide no data on the expression of RET and GFRa1 proteins in FAP cells.

      - Reply for this comment by the Reviewer is in the Recommendations for the authors section below ([2]), as the same comment is repeated.

      [2] Another problem is the lack of information showing that GDNF secreted by Schwann cells can activate RET and its down-stream signaling in FAP cells. There is no direct experimental proof that GDNF activating GFRa1-RET signaling triggers BDNF upregulation In FAP cells. The data that GDNF signaling is inducing the synthesis and secretion of BDNF is also not conclusive.

      - Reply for this comment by the Reviewer is in the Recommendations for the authors section below ([3]), as the same comment is repeated.

      Recommendations for the authors:

      [1] Although this is a novel study and contains very well-performed parts, the GDNF section is preliminary and requires additional experimentation. In the introduction authors describe well FAPs but even do not mention how GDNF is signaling. Moreover, the reader may get an impression that Ras-MAPK pathway is the only or at least the main GDNF signaling pathway. In fact, for neurons Akt and Src signaling pathways play also crucial role.

      - We apologize for the missing content in the Introduction section of our manuscript and for any confusion caused by our misleading description of the GDNF signaling pathway. We have revised our manuscript to include the GDNF signaling pathway in the Introduction section, along with a description of other downstream signaling pathways of GDNF that are known to play crucial roles, as mentioned by the Reviewer (lines 115-130). Additionally, we changed the expression in the Results section to avoid making any misleading impressions (lines 318-319).

      [2] In Fig. 1 and 2 authors provide data on scRNA seq and this is important information reporting the finding of RET and GFRa1 transcripts in the subpopulation of FAP cells. However, authors provide no data on the expression of RET and GFRa1 proteins in FAP cells.

      - We appreciate the Reviewer for the constructive comment. Though we fully agree with the Reviewer that validating the expression of RET and GFRα1 proteins in FAPs is needed, we were unable to obtain the antibodies required for such experiments within the limited time frame for this revision. We hope that the Reviewer understands our limitation. Although we could not directly show the expression of those GDNF receptor genes at the protein level in FAPs, based on the result where intramuscular GDNF injection could sufficiently induce Bdnf expression in FAPs compared to PBS control in the absence of nerve damage, it is likely that GDNF receptors are indeed expressed at the protein level in FAPs, since if otherwise, FAPs would not have been able to respond to the injected GDNF protein. Nevertheless, in a future study, we will try to validate the protein-level expression of GDNF receptors in FAPs to comply with the Reviewer’s suggestion and to further support this study.

      [3] Another problem is the lack of information showing that GDNF secreted by Schwann cells can activate RET and its down-stream signaling in FAP cells. Authors can monitor activation of MAPK pathway by detecting phospho-Erk and PI3 kinase-Akt pathway measuring phospho-S6 using immunohistochemistry. We can recommend to use the following antibodies: pErk1/2 (1:300, Cell Signaling, Cat# 4370L RRID:AB_2297462), pS6 (1:300, Cell Signaling, Cat# 4858L RRID:AB_1031194). These experiments are crucial because RET and GFRa1 proteins maybe not expressed at the sufficient level on the cell surface.

      - We sincerely appreciate the Reviewer’s constructive comment. In this study, we suggested that the GDNF-BDNF axis within FAPs would signal through the MAPK pathway based on the bioinformatic analysis of our single cell RNA-seq data and matching the results with the previously known pathways. We fully agree that monitoring the activation of the MAPK pathway and the PI3K-Akt pathway by immunohistochemistry would experimentally demostrate whether GDNF can activate those pathways within FAPs through GFRα1/RET activation. Unfortunately, we could not obtain the antibodies suggested by the Reviewer for this revision due to insufficient research funds and limited time frame. We hope that the Reviewer understands our limitation. In future studies, we will try to validate the detailed molecular pathway that mediates the GDNF-BDNF axis in FAPs by incorporating the methodology suggested by the Reviewer, along with implementation of genetic models such as Plp1CreER; Gdnffl/fl, Prrx1Cre; Retfl/fl or Prrx1Cre; Gfra1fl/fl to validate whether Schwann cell-derived

      GDNF can actually signal through its canonical receptor RET/GFRα1 expressed in FAPs to induce expression of BDNF upon nerve injury.

      [4] (a) There is no direct experimental proof that GDNF activating GFRa1-RET signaling triggers BDNF upregulation in FAP cells. Authors can use GDNF blocking antibodies, siRNA or use RET or GFRa1 cKO mice to delete them from FAP cells. (b) The data that GDNF signaling is inducing the synthesis and secretion of BDNF is also not conclusive. Authors should show that GDNF injection is increasing BDNF protein levels in FAPs. To get sufficient material for ELISA detection of BDNF is perhaps problematic. However, authors can use BDNF antibodies from Icosagen company and use IHC.

      - (a) We appreciate the Reviewer for the critical comment. As mentioned in the reply for Reviewer #1 Public review [3], we used GDNF blocking antibodies to reduce GDNF signaling within the tibialis anterior and gastrocnemius muscles by intramuscular injection after sciatic nerve crush injury, and included the result as a new figure supplement in our revised manuscript (New Figure 4—figure supplement 2) with its details in both the Results section (lines 381-390) and the Materials and Methods section (lines 611-616). Though the results were not statistically significant, intramuscular injection of anti-GDNF antibodies showed a tendency toward reduced Bdnf expression in FAPs, compared to IgG controls. As mentioned in the reply for Reviewer #1 Public review [3], and as suggested by the Reviewer, using cKO mice such as Plp1CreER; Gdnffl/fl, Prrx1Cre; Retfl/fl, or Prrx1Cre; Gfra1fl/fl mice would further validate the GDNF-BDNF axis suggested in this study, likely with statistical significance. Unfortunately, obtaining these genetic models within the limited time frame of this current revision is not feasible. We will try to adopt such models in our future study to validate the role of Schwann cell-derived GDNF in inducing BDNF expression in FAPs via activation of RET/GFRα1.  

      - (b) We appreciate the Reviewer for the constructive comment. Though we fully agree that the experiment suggested by the Reviewer would validate the synthesis and secretion of BDNF protein by GDNF signaling in FAPs, we were not able to perform it due to lack of research funds to obtain enough amount of the GDNF protein. We hope that the Reviewer understands our limitation. Still, combining the results from New Figure 4H in this study with the New Figure 4F, where GDNF injection induced Bdnf mRNA expression in FAPs, and BDNF protein expression in FAPs in response to nerve injury was demonstrated via western blot, we anticipate that GDNF injection would increase BDNF protein levels in FAPs, though direct validation of this statement would require conducting the additional experiments mentioned by the Reviewer.

      References

      Chan JR, Cosgaya JM, Wu YJ, and Shooter EM (2001). Neurotrophins are key mediators of the myelination program in the peripheral nervous system. Proceedings of the National Academy of Sciences 98:14661-14668.

      English AW, Liu K, Nicolini JM, Mulligan AM, and Ye K (2013). Small-molecule trkB agonists promote axon regeneration in cut peripheral nerves. Proc Natl Acad Sci U S A 110:16217-22.10.1073/pnas.1303646110

      Giordani L, He GJ, Negroni E, Sakai H, Law JY, Siu MM, Wan R, Corneau A, Tajbakhsh S, and Cheung TH (2019). High-dimensional single-cell cartography reveals novel skeletal muscle-resident cell populations. Molecular Cell 74:609-621. e6.

      Höke A, Gordon T, Zochodne D, and Sulaiman O (2002). A decline in glial cell-linederived neurotrophic factor expression is associated with impaired regeneration after long-term Schwann cell denervation. Experimental neurology 173:77-85.

      Kim J-H, Kang J-S, Yoo K, Jeong J, Park I, Park JH, Rhee J, Jeon S, Jo Y-W, and Hann S-H (2022). Bap1/SMN axis in Dpp4+ skeletal muscle mesenchymal cells regulates the neuromuscular system. JCI Insight 7:

      Leinroth AP, Mirando AJ, Rouse D, Kobayahsi Y, Tata PR, Rueckert HE, Liao Y, Long JT, Chakkalakal JV, and Hilton MJ (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports 39:

      Liu L, Cheung TH, Charville GW, and Rando TA (2015). Isolation of skeletal muscle stem cells by fluorescence-activated cell sorting. Nature protocols 10:1612-1624.

      Oudega M, and Hagg T (1998). Neurotrophins promote regeneration of sensory axons in the adult rat spinal cord. Brain Research 818:431-438.10.1016/S0006-8993(98)01314-6

      Xiao J, Wong AW, Willingham MM, Kaasinen SK, Hendry IA, Howitt J, Putz U, Barrett GL, Kilpatrick TJ, and Murray SS (2009). BDNF exerts contrasting effects on peripheral myelination of NGF-dependent and BDNF-dependent DRG neurons. J Neurosci 29:4016-22.10.1523/JNEUROSCI.3811-08.2009

      Xu P, Rosen KM, Hedstrom K, Rey O, Guha S, Hart C, and Corfas G (2013). Nerve injury induces glial cell linederived neurotrophic factor (gdnf) expression in schwann cells through purinergic signaling and the pkcpkd pathway. Glia 61:1029-1040.

      Zhang JY, Luo XG, Xian CJ, Liu ZH, and Zhou XF (2000). Endogenous BDNF is required for myelination and regeneration of injured sciatic nerve in rodents. European Journal of Neuroscience 12:4171-4180.10.1111/j.1460-9568.2000.01312.x

      Zheng J, Sun J, Lu X, Zhao P, Li K, and Li L (2016). BDNF promotes the axonal regrowth after sciatic nerve crush through intrinsic neuronal capability upregulation and distal portion protection. Neuroscience letters 621:1-8.

    1. Author Response:

      The following is the authors’ response to the previous reviews.

      We carefully read through the second-round reviews and the additional reviews. To us, the review process is somewhat unusual and very much dominated by referee 2, who aggressively insists that we mixed up the trigeminal nucleus and inferior olive and that as a consequence our results are meaningless. We think the stance of referee 2 and the focus on one single issue (the alleged mix-up of trigeminal nucleus and inferior olive) is somewhat unfortunate, leaves out much of our findings and we debated at length on how to deal with further revisions. In the end, we decided to again give priority to addressing the criticism of referees 2, because it is hard to go on with a heavily attacked paper without resolving the matter at stake. The following is a summary of, what we did:

      Additional experimental work:

      (1) We checked if the peripherin-antibody indeed reliably identifies climbing fibers.

      To this end, we sectioned the elephant cerebellum and stained sections with the peripherin-antibody. We find: (i) the cerebellar white matter is strongly reactive for peripherin-antibodies, (ii) cerebellar peripherin-antibody staining of has an axonal appearance. (iii) Cerebellar Purkinje cell somata appear to be ensheated by peripherin-antibody staining. (iv) We observed that the peripherin-antibody reactivity gradually decreases from Purkinje cell somata to the pia in the cerebellar molecular layer. This work is shown in our revised Figure 2. All these four features align with the distribution of climbing fibers (which arrive through the white matter, are axons, ensheat Purkinje cell somata, and innervate Purkinje cell proximally not reaching the pia). In line with previous work, which showed similar cerebellar staining patterns in several species (Errante et al. 1998), we conclude that elephant climbing fibers are strongly reactive for peripherin-antibodies.

      (2) We delineated the elephant olivo-cerebellar tract.

      The strong peripherin-antibody reactivity of elephant climbing fibers enabled us to delineate the elephant olivo-cerebellar tract. We find the elephant olivo-cerebellar tract is a strongly peripherin-antibody reactive, well-delineated fiber tract several millimeters wide and about a centimeter in height. The unstained olivo-cerebellar tract has a greyish appearance. In the anterior regions of the olivo-cerebellar tract, we find that peripherin-antibody reactive fibers run in the dorsolateral brainstem and approach the cerebellar peduncle, where the tract gradually diminishes in size, presumably because climbing fibers discharge into the peduncle. Indeed, peripherin-antibody reactive fibers can be seen entering the cerebellar peduncle. Towards the posterior end of the peduncle, the olivo-cerebellar disappears (in the dorsal brainstem directly below the peduncle. We note that the olivo-cerebellar tract was referred to as the spinal trigeminal tract by Maseko et al. 2013. We think the tract in question cannot be the spinal trigeminal tract for two reasons: (i) This tract is the sole brainstem source of peripherin-positive climbing fibers entering the peduncle/ the cerebellum; this is the defining characteristic of the olivo-cerebellar tract. (ii) The tract in question is much smaller than the trigeminal nerve, disappears posterior to where the trigeminal nerve enters the brainstem (see below), and has no continuity with the trigeminal nerve; the continuity with the trigeminal nerve is the defining characteristic of the spinal trigeminal tract, however.

      The anterior regions of the elephant olivo-cerebellar tract are similar to the anterior regions of olivo-cerebellar tract of other mammals in its dorsolateral position and the relation to the cerebellar peduncle. In its more posterior parts, the elephant olivo-cerebellar tract continues for a long distance (~1.5 cm) in roughly the same dorsolateral position and enters the serrated nucleus that we previously identified as the elephant inferior olive. The more posterior parts of the elephant olivo-cerebellar tract therefore differ from the more posterior parts of the olivo-cerebellar tract of other mammals, which follows a ventromedial trajectory towards a ventromedially situated inferior olive. The implication of our delineation of the elephant olivo-cerebellar tract is that we correctly identified the elephant inferior olive.

      (3) An in-depth analysis of peripherin-antibody reactivity also indicates that the trigeminal nucleus receives no climbing fiber input.

      We also studied the peripherin-antibody reactivity in and around the trigeminal nucleus. We had also noted in the previous submission that the trigeminal nucleus is weakly positive for peripherin, but that the staining pattern is uniform and not the type of axon bundle pattern that is seen in the inferior olive of other mammals. To us, this observation already argued against the presence of climbing fibers in the trigeminal nucleus. We also noted that the myelin stripes of the trigeminal nucleus were peripherin-antibody-negative. In the context of our olivo-cerebellar tract tracing we now also scrutinized the surroundings of the trigeminal nucleus for peripherin-antibody reactivity. We find that the ventral brainstem surrounding the trigeminal nucleus is devoid of peripherin-antibody reactivity. Accordingly, no climbing fibers, (which we have shown to be strongly peripherin-antibody-positive, see our point 1) arrive at the trigeminal nucleus. The absence of climbing fiber input indicates that previous work that identified the (trigeminal) nucleus as the inferior olive (Maseko et al 2013) is unlikely to be correct.

      (4) We characterized the entry of the trigeminal nerve into the elephant brain.

      To better understand how trigeminal information enters the elephant’s brain, we characterized the entry of the trigeminal nerve. This analysis indicated to us that the trigeminal nerve is not continuous with the olivo-cerebellar tract (the spinal trigeminal tract of Maseko et al. 2013) as previously claimed by Maseko et al. 2013. We show some of this evidence in Referee-Figure 1 below. The reason we think the trigeminal nerve is discontinuous with the olivo-cerebellar tract is the size discrepancy between the two structures. We first show this for the tracing data of Maseko et al. 2013. In the Maseko et al. 2013 data the trigeminal nerve (Referee-Figure 1A, their plate Y) has 3-4 times the diameter of the olivocerebellar tract (the alleged spinal trigeminal tract, Referee-Figure 1B, their plate Z). Note that most if not all trigeminal fibers are thought to continue from the nerve into the trigeminal tract (see our rat data below). We plotted the diameter of the trigeminal nerve and diameter of the olivo-cerebellar (the spinal trigeminal tract according to Maseko et al. 2013) from the Maseko et al. 2013 data (Referee-Figure 1C) and we found that the olivocerebellar tract has a fairly consistent diameter (46 ± 9 mm2, mean ± SD). Statistical considerations and anatomical evidence suggest that the tracing of the trigeminal nerve into the olivo-cerebellar (the spinal trigeminal tract according to Maseko et al. 2013) is almost certainly wrong. The most anterior point of the alleged spinal trigeminal tract has a diameter of 51 mm2 which is more than 15 standard deviations different from the most posterior diameter (194 mm2) of the trigeminal tract. For this assignment to be correct three-quarters of trigeminal nerve fibers would have to spontaneously disappear, something that does not happen in the brain. We also made similar observations in the African elephant Bibi, where the trigeminal nerve (Referee-Figure 1D) is much larger in diameter than the olivocerebellar tract (Referee-Figure 1E). We could also show that the olivocerebellar tract disappears into the peduncle posterior to where the trigeminal nerve enters (Referee-Figure 1F). Our data are very similar to Maseko et al. indicating that their outlining of structures was done correctly. What appears to have been oversimplified, is the assignment of structures as continuous. We also quantified the diameter of the trigeminal nerve and the spinal trigeminal tract in rats (from the Paxinos & Watson atlas; Referee-Figure 1D); as expected we found the trigeminal nerve and spinal trigeminal tract diameters are essentially continuous.

      In our hands, the trigeminal nerve does not continue into a well-defined tract that could be traced after its entry. In this regard, it differs both from the olivo-cerebellar tract of the elephant or the spinal trigeminal tract of the rodent, both of which are well delineated. We think the absence of a well-delineated spinal trigeminal tract in elephants might have contributed to the putative tracing error highlighted in our Referee-Figure 1A-C.

      We conclude that a size mismatch indicates trigeminal fibers do not run in the olivo-cerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013).

      Author response image 1.

      The trigeminal nerve is discontinuous with the olivo-cerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013)

      A, Trigeminal nerve (orange) in the brain of African elephant LAX as delineated by Maseko et al. 2013 (coronal section; their plate Y).

      B, Most anterior appearance of the spinal trigeminal tract of Maseko et al. 2013 (blue; coronal section; their plate Z). Note the much smaller diameter of the spinal trigeminal tract compared to the trigeminal nerve shown in C, which argues against the continuity of the two structures. Indeed, our peripherin-antibody staining showed that the spinal trigeminal tract of Maseko corresponds to the olivo-cerebellar tract and is discontinuous with the trigeminal nerve.

      C, Plot of the trigeminal nerve and olivo-cerebellar tracts (the spinal trigeminal tract according to Maseko et al. 2013) diameter along the anterior-posterior axis. The trigeminal nerve is much larger in diameter than the olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013). C, D measurements, for which sections are shown in panels C and D respectively. The olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013) has a consistent diameter; data replotted from Maseko et al. 2013. At mm 25 the inferior olive appears.

      D, Trigeminal nerve entry in the brain of African elephant Bibi; our data, coronal section, the trigeminal nerve is outlined in orange, note the large diameter.

      E, Most anterior appearance of the olivo-cerebellar tract in the brain of African elephant Bibi; our data, coronal section, approximately 3 mm posterior to the section shown in A, the olivocerebellar tract is outlined in blue. Note the smaller diameter of the olivo-cerebellar tract compared to the trigeminal nerve, which argues against the continuity of the two structures.

      F, Plot of the trigeminal nerve and olivo-cerebellar tract diameter along the anterior-posterior axis. The nerve and olivo-cerebellar tract are discontinuous and the trigeminal nerve is much larger in diameter than the olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013); our data. D, E measurements, for which sections are shown in panels D and E respectively. At mm 27 the inferior olive appears.

      G, In the rat the trigeminal nerve is continuous in size with the spinal trigeminal tract. Data replotted from Paxinos and Watson.

      Reviewer 2 (Public Review):

      As indicated in my previous review of this manuscript (see above), it is my opinion that the authors have misidentified, and indeed switched, the inferior olivary nuclear complex (IO) and the trigeminal nuclear complex (Vsens). It is this specific point only that I will address in this second review, as this is the crucial aspect of this paper - if the identification of these nuclear complexes in the elephant brainstem by the authors is incorrect, the remainder of the paper does not have any scientific validity.

      Comment: We agree with the referee that it is most important to sort out, the inferior olivary nuclear complex (IO) and the trigeminal nuclear complex, respectively.Change: We did additional experimental work to resolve this matter as detailed at the beginning of our response. Specifically, we ascertained that elephant climbing fibers are strongly peripherin-positive. Based on elephant climbing fiber peripherin-reactivity we delineated the elephant olivo-cerebellar tract. We find that the olivo-cerebellar connects to the structure we refer to as inferior olive to the cerebellum (the referee refers to this structure as the trigeminal nuclear complex). We also found that the trigeminal nucleus (the structure the referee refers to as inferior olive) appears to receive no climbing fibers. We provide indications that the tracing of the trigeminal nerve into the olivo-cerebellar tract by Maseko et al. 2023 was erroneous (Author response image 1). These novel findings support our ideas but are very difficult to reconcile with the referee’s partitioning scheme.

      The authors, in their response to my initial review, claim that I "bend" the comparative evidence against them. They further claim that as all other mammalian species exhibit a "serrated" appearance of the inferior olive, and as the elephant does not exhibit this appearance, that what was previously identified as the inferior olive is actually the trigeminal nucleus and vice versa. 

      For convenience, I will refer to IOM and VsensM as the identification of these structures according to Maseko et al (2013) and other authors and will use IOR and VsensR to refer to the identification forwarded in the study under review. <br /> The IOM/VsensR certainly does not have a serrated appearance in elephants. Indeed, from the plates supplied by the authors in response (Referee Fig. 2), the cytochrome oxidase image supplied and the image from Maseko et al (2013) shows a very similar appearance. There is no doubt that the authors are identifying structures that closely correspond to those provided by Maseko et al (2013). It is solely a contrast in what these nuclear complexes are called and the functional sequelae of the identification of these complexes (are they related to the trunk sensation or movement controlled by the cerebellum?) that is under debate.

      Elephants are part of the Afrotheria, thus the most relevant comparative data to resolve this issue will be the identification of these nuclei in other Afrotherian species. Below I provide images of these nuclear complexes, labelled in the standard nomenclature, across several Afrotherian species. 

      (A) Lesser hedgehog tenrec (Echinops telfairi) 

      Tenrecs brains are the most intensively studied of the Afrotherian brains, these extensive neuroanatomical studies undertaken primarily by Heinz Künzle. Below I append images (coronal sections stained with cresol violet) of the IO and Vsens (labelled in the standard mammalian manner) in the lesser hedgehog tenrec. It should be clear that the inferior olive is located in the ventral midline of the rostral medulla oblongata (just like the rat) and that this nucleus is not distinctly serrated. The Vsens is located in the lateral aspect of the medulla skirted laterally by the spinal trigeminal tract (Sp5). These images and the labels indicating structures correlate precisely with that provide by Künzle (1997, 10.1016, see his Figure 1K,L. Thus, in the first case of a related species, there is no serrated appearance of the inferior olive, the location of the inferior olive is confirmed through connectivity with the superior colliculus (a standard connection in mammals) by Künzle (1997), and the location of Vsens is what is considered to be typical for mammals. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      Peer Review Image 1.

      (B) Giant otter shrew (Potomogale velox) 

      The otter shrews are close relatives of the Tenrecs. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see hints of the serration of the IO as defined by the authors, but we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report.

      Peer Response Image 2.

      (C) Four-toed sengi (Petrodromus tetradactylus) 

      The sengis are close relatives of the Tenrecs and otter shrews, these three groups being part of the Afroinsectiphilia, a distinct branch of the Afrotheria. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see vague hints of the serration of the IO (as defined by the authors), and we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      Peer Response Image 3.

      (D) Rock hyrax (Procavia capensis) 

      The hyraxes, along with the sirens and elephants form the Paenungulata branch of the Afrotheria. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per the standard mammalian anatomy. Here we see hints of the serration of the IO (as defined by the authors), but we also see evidence of a more "bulbous" appearance of subnuclei of the IO (particularly the principal nucleus), and we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      Peer Review Image 4.

      (E) West Indian manatee (Trichechus manatus) 

      The sirens are the closest extant relatives of the elephants in the Afrotheria. Below I append images of cresyl violet (top) and myelin (bottom) stained coronal sections (taken from the University of Wisconsin-Madison Brain Collection, https://brainmuseum.org, and while quite low in magnification they do reveal the structures under debate) through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see the serration of the IO (as defined by the authors). Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report.

      Peer Review Image 5.

      These comparisons and the structural identification, with which the authors agree as they only distinguish the elephants from the other Afrotheria, demonstrate that the appearance of the IO can be quite variable across mammalian species, including those with a close phylogenetic affinity to the elephants. Not all mammal species possess a "serrated" appearance of the IO. Thus, it is more than just theoretically possible that the IO of the elephant appears as described prior to this study. 

      So what about elephants? Below I append a series of images from coronal sections through the African elephant brainstem stained for Nissl, myelin, and immunostained for calretinin. These sections are labelled according to standard mammalian nomenclature. In these complete sections of the elephant brainstem, we do not see a serrated appearance of the IOM (as described previously and in the current study by the authors). Rather the principal nucleus of the IOM appears to be bulbous in nature. In the current study, no image of myelin staining in the IOM/VsensR is provided by the authors. However, in the images I provide, we do see the reported myelin stripes in all stains - agreement between the authors and reviewer on this point. The higher magnification image to the bottom left of the plate shows one of the IOM/VsensR myelin stripes immunostained for calretinin, and within the myelin stripes axons immunopositive for calretinin are seen (labelled with an arrow). The climbing fibres of the elephant cerebellar cortex are similarly calretinin immunopositive (10.1159/000345565). In contrast, although not shown at high magnification, the fibres forming the Sp5 in the elephant (in the Maseko description, unnamed in the description of the authors) show no immunoreactivity to calretinin. 

      Peer Review Image 6.

      Comment: We appreciate the referee’s additional comments. We concede the possibility that some relatives of elephants have a less serrated inferior olive than most other mammals. We maintain, however, that the elephant inferior olive (our Figure 1J) has the serrated appearance seen in the vast majority of mammals.

      Change: None.

      Peripherin Immunostaining 

      In their revised manuscript the authors present immunostaining of peripherin in the elephant brainstem. This is an important addition (although it does replace the only staining of myelin provided by the authors which is unusual as the word myelin is in the title of the paper) as peripherin is known to specifically label peripheral nerves. In addition, as pointed out by the authors, peripherin also immunostains climbing fibres (Errante et al., 1998). The understanding of this staining is important in determining the identification of the IO and Vsens in the elephant, although it is not ideal for this task as there is some ambiguity. Errante and colleagues (1998; Fig. 1) show that climbing fibres are peripherin-immunopositive in the rat. But what the authors do not evaluate is the extensive peripherin staining in the rat Sp5 in the same paper (Errante et al, 1998, Fig. 2). The image provided by the authors of their peripherin immunostaining (their new Figure 2) shows what I would call the Sp5 of the elephant to be strongly peripherin immunoreactive, just like the rat shown in Errant et al (1998), and more over in the precise position of the rat Sp5! This makes sense as this is where the axons subserving the "extraordinary" tactile sensitivity of the elephant trunk would be found (in the standard model of mammalian brainstem anatomy). Interestingly, the peripherin immunostaining in the elephant is clearly lamellated...this coincides precisely with the description of the trigeminal sensory nuclei in the elephant by Maskeo et al (2013) as pointed out by the authors in their rebuttal. Errante et al (1998) also point out peripherin immunostaining in the inferior olive, but according to the authors this is only "weakly present" in the elephant IOM/VsensR. This latter point is crucial. Surely if the elephant has an extraordinary sensory innervation from the trunk, with 400 000 axons entering the brain, the VsensR/IOM should be highly peripherin-immunopositive, including the myelinated axon bundles?! In this sense, the authors argue against their own interpretation - either the elephant trunk is not a highly sensitive tactile organ, or the VsensR is not the trigeminal nuclei it is supposed to be. 

      Comment: We made sure that elephant climbing fibers are strongly peripherin-positive (our revised Figure 2). As we noted in already our previous ms, we see weak diffuse peripherin-reactivity in the trigeminal nucleus (the inferior olive according to the referee), but no peripherin-reactive axon bundles (i.e. climbing fibers) that are seen in the inferior olive of other species. We also see no peripherin-reactive axon bundles (i.e. the olivo-cerebellar tract) arriving in the trigeminal nucleus as the tissue surrounding the trigeminal nucleus is devoid of peripherin-reactivity. Again, this finding is incompatible with the referee’s ideas. As far as we can tell, the trigeminal fibers are not reactive for peripherin in the elephant, i.e. we did not observe peripherin-reactivity very close to the nerve entry, but unfortunately, we did not stain for peripherin-reactivity into the nerve. As the referee alludes to the absence of peripherin-reactivity in the trigeminal tract is a difference between rodents and elephants.

      Change: Our novel Figure 2.

      Summary: 

      (1) Comparative data of species closely related to elephants (Afrotherians) demonstrates that not all mammals exhibit the "serrated" appearance of the principal nucleus of the inferior olive. 

      (2) The location of the IO and Vsens as reported in the current study (IOR and VsensR) would require a significant, and unprecedented, rearrangement of the brainstem in the elephants independently. I argue that the underlying molecular and genetic changes required to achieve this would be so extreme that it would lead to lethal phenotypes. Arguing that the "switcheroo" of the IO and Vsens does occur in the elephant (and no other mammals) and thus doesn't lead to lethal phenotypes is a circular argument that cannot be substantiated. 

      (3) Myelin stripes in the subnuclei of the inferior olivary nuclear complex are seen across all related mammals as shown above. Thus, the observation made in the elephant by the authors in what they call the VsensR, is similar to that seen in the IO of related mammals, especially when the IO takes on a more bulbous appearance. These myelin stripes are the origin of the olivocerebellar pathway, and are indeed calretinin immunopositive in the elephant as I show. 

      (4) What the authors see aligns perfectly with what has been described previously, the only difference being the names that nuclear complexes are being called. But identifying these nuclei is important, as any functional sequelae, as extensively discussed by the authors, is entirely dependent upon accurately identifying these nuclei. 

      (4) The peripherin immunostaining scores an own goal - if peripherin is marking peripheral nerves (as the authors and I believe it is), then why is the VsensR/IOM only "weakly positive" for this stain? This either means that the "extraordinary" tactile sensitivity of the elephant trunk is non-existent, or that the authors have misinterpreted this staining. That there is extensive staining in the fibre pathway dorsal and lateral to the IOR (which I call the spinal trigeminal tract), supports the idea that the authors have misinterpreted their peripherin immunostaining.

      (5) Evolutionary expediency. The authors argue that what they report is an expedient way in which to modify the organisation of the brainstem in the elephant to accommodate the "extraordinary" tactile sensitivity. I disagree. As pointed out in my first review, the elephant cerebellum is very large and comprised of huge numbers of morphologically complex neurons. The inferior olivary nuclei in all mammals studied in detail to date, give rise to the climbing fibres that terminate on the Purkinje cells of the cerebellar cortex. It is more parsimonious to argue that, in alignment with the expansion of the elephant cerebellum (for motor control of the trunk), the inferior olivary nuclei (specifically the principal nucleus) have had additional neurons added to accommodate this cerebellar expansion. Such an addition of neurons to the principal nucleus of the inferior olive could readily lead to the loss of the serrated appearance of the principal nucleus of the inferior olive, and would require far less modifications in the developmental genetic program that forms these nuclei. This type of quantitative change appears to be the primary way in which structures are altered in the mammalian brainstem. 

      Comment: We still disagree with the referee. We note that our conclusions rest on the analysis of 8 elephant brainstems, which we sectioned in three planes and stained with a variety of metabolic and antibody stains and in which assigned two structures (the inferior olive and the trigeminal nucleus). Most of the evidence cited by the referee stems from a single paper, in which 147 structures were identified based on the analysis of a single brainstem sectioned in one plane and stained with a limited set of antibodies. Our synopsis of the evidence is the following.

      (1) We agree with the referee that concerning brainstem position our scheme of a ventromedial trigeminal nucleus and a dorsolateral inferior olive deviates from the usual mammalian position of these nuclei (i.e. a dorsolateral trigeminal nucleus and a ventromedial inferior olive).

      (2) Cytoarchitectonics support our partitioning scheme. The compact cellular appearance of our ventromedial trigeminal nucleus is characteristic of trigeminal nuclei. The serrated appearance of our dorsolateral inferior olive is characteristic of the mammalian inferior olive; we acknowledge that the referee claims exceptions here. To our knowledge, nobody has described a mammalian trigeminal nucleus with a serrated appearance (which would apply to the elephant in case the trigeminal nucleus is situated dorsolaterally).

      (3) Metabolic staining (Cyto-chrome-oxidase reactivity) supports our partitioning scheme. Specifically, our ventromedial trigeminal nucleus shows intense Cyto-chrome-oxidase reactivity as it is seen in the trigeminal nuclei of trigeminal tactile experts.

      (4) Isomorphism. The myelin stripes on our ventromedial trigeminal nucleus are isomorphic to trunk wrinkles. Isomorphism is a characteristic of somatosensory brain structures (barrel, barrelettes, nose-stripes, etc) and we know of no case, where such isomorphism was misleading.

      (5) The large-scale organization of our ventromedial trigeminal nuclei in anterior-posterior repeats is characteristic of the mammalian trigeminal nuclei. To our knowledge, no such organization has ever been reported for the inferior olive.

      (6) Connectivity analysis supports our partitioning scheme. According to our delineation of the elephant olivo-cerebellar tract, our dorsolateral inferior olive is connected via peripherin-positive climbing fibers to the cerebellum. In contrast, our ventromedial trigeminal nucleus (the referee’s inferior olive) is not connected via climbing fibers to the cerebellum.

      Change: As discussed, we advanced further evidence in this revision. Our partitioning scheme (a ventromedial trigeminal nucleus and a dorsolateral inferior olive) is better supported by data and makes more sense than the referee’s suggestion (a dorsolateral trigeminal nucleus and a ventromedial inferior olive). It should be published.

      Reviewer #3 (Public Review):

      Summary: 

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identify large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning that they likely correspond with trunk folds; however this conclusion is not well supported if the nucleus has been misidentified. 

      Comment: The referee provides a summary of our work. The referee also notes that the correct identification of the trigeminal nucleus is critical to the message of our paper.

      Change: In line with these assessments we focused our revision efforts on the issue of trigeminal nucleus identification, please see our introductory comments and our response to Referee 2.

      Strengths: 

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Comment: We appreciate this positive assessment.

      Change: None

      Weaknesses: 

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections. 

      Comment: We understand these criticisms and the need for cautious interpretation. As we noted previously, we think that the Elife-publishing scheme, where critical referee commentary is published along with our ms, will make this contribution particularly valuable.

      Change: Our additional efforts to secure the correct identification of the trigeminal nucleus.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data to different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings. 

      Comment: We understand, why the referee asks for additional comparative data, which would make our study more meaningful. We note that we already published a quantitative comparison of African and Asian elephant facial nuclei (Kaufmann et al. 2022). The quantitative differences between African and Asian elephant facial nuclei are similar in magnitude to what we observed here for the trigeminal nucleus, i.e. African elephants have about 10-15% more facial nucleus neurons than Asian elephants. The referee also notes that data on overall elephant brain size might be important for interpreting our data. We agree with this sentiment and we are preparing a ms on African and Asian elephant brain size. We find – unexpectedly given the larger body size of African elephants – that African elephants have smaller brains than Asian elephants. The finding might imply that African elephants, which have more facial nucleus neurons and more trigeminal nucleus trunk module neurons, are neurally more specialized in trunk control than Asian elephants.

      Change: We are preparing a further ms on African and Asian elephant brain size, a first version of this work has been submitted.

      Reviewer #4 (Public Review): 

      Summary: 

      The authors report a novel isomorphism in which the folds of the elephant trunk are recognizably mapped onto the principal sensory trigeminal nucleus in the brainstem. Further, they identifiy the enlarged nucleus as being situated in this species in an unusual ventral midline position. 

      Comment: The referee summarizes our work.

      Change: None.

      Strengths: 

      The identity of the purported trigeminal nucleus and the isomorphic mapping with the trunk folds is supported by multiple lines of evidence: enhanced staining for cytochrome oxidase, an enzyme associated with high metabolic activity; dense vascularization, consistent with high metabolic activity; prominent myelinated bundles that partition the nucleus in a 1:1 mapping of the cutaneous folds in the trunk periphery; near absence of labeling for the anti-peripherin antibody, specific for climbing fibers, which can be seen as expected in the inferior olive; and a high density of glia.

      Comment: The referee again reviews some of our key findings.

      Change: None. 

      Weaknesses: 

      Despite the supporting evidence listed above, the identification of the gross anatomical bumps, conspicuous in the ventral midline, is problematic. This would be the standard location of the inferior olive, with the principal trigeminal nucleus occupying a more dorsal position. This presents an apparent contradiction which at a minimum needs further discussion. Major species-specific specializations and positional shifts are well-documented for cortical areas, but nuclear layouts in the brainstem have been considered as less malleable. 

      Comment: The referee notes that our discrepancy with referee 2, needs to be addressed with further evidence and discussion, given the unusual position of both inferior olive and trigeminal nucleus in the partitioning scheme and that the mammalian brainstem tends to be positionally conservative. We agree with the referee. We note that – based on the immense size of the elephant trigeminal ganglion (50 g), half the size of a monkey brain – it was expected that the elephant trigeminal nucleus ought to be exceptionally large.

      Change: We did additional experimental work to resolve this matter: (i) We ascertained that elephant climbing fibers are strongly peripherin-positive. (ii) Based on elephant climbing fiber peripherin-reactivity we delineated the elephant olivo-cerebellar tract. We find that the olivo-cerebellar connects to the structure we refer to as inferior olive to the cerebellum. (iii) We also found that the trigeminal nucleus (the structure the referee refers to as inferior olive) appears to receive no climbing fibers. (iv) We provide indications that the tracing of the trigeminal nerve into the olivo-cerebellar tract by Maseko et al. 2023 was erroneous (Referee-Figure 1). These novel findings support our ideas.

      Reviewer #5 (Public Review): 

      After reading the manuscript and the concerns raised by reviewer 2 I see both sides of the argument - the relative location of trigeminal nucleus versus the inferior olive is quite different in elephants (and different from previous studies in elephants), but when there is a large disproportionate magnification of a behaviorally relevant body part at most levels of the nervous system (certainly in the cortex and thalamus), you can get major shifting in location of different structures. In the case of the elephant, it looks like there may be a lot of shifting. Something that is compelling is that the number of modules separated but the myelin bands correspond to the number of trunk folds which is different in the different elephants. This sort of modular division based on body parts is a general principle of mammalian brain organization (demonstrated beautifully for the cuneate and gracile nucleus in primates, VP in most of species, S1 in a variety of mammals such as the star nosed mole and duck-billed platypus). I don't think these relative changes in the brainstem would require major genetic programming - although some surely exists. Rodents and elephants have been independently evolving for over 60 million years so there is a substantial amount of time for changes in each l lineage to occur.

      I agree that the authors have identified the trigeminal nucleus correctly, although comparisons with more out groups would be needed to confirm this (although I'm not suggesting that the authors do this). I also think the new figure (which shows previous divisions of the brainstem versus their own) allows the reader to consider these issues for themselves. When reviewing this paper, I actually took the time to go through atlases of other species and even look at some of my own data from highly derived species. Establishing homology across groups based only on relative location is tough especially when there appears to be large shifts in relative location of structures. My thoughts are that the authors did an extraordinary amount of work on obtaining, processing and analyzing this extremely valuable tissue. They document their work with images of the tissue and their arguments for their divisions are solid. I feel that they have earned the right to speculate - with qualifications - which they provide. 

      Comment: The referee summarizes our work and appears to be convinced by the line of our arguments. We are most grateful for this assessment. We add, again, that the skeptical assessment of referee 2 will be published as well and will give the interested reader the possibility to view another perspective on our work.

      Change: None. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors):

      With this manuscript being virtually identical to the previous version, it is possible that some of the definitive conclusions about having identified the elephant trigeminal nucleus and trunk representation should be moderated in a more nuanced manner, especially given the careful and experienced perspective from reviewers with first hand knowledge elephant neuroanatomy.

      Comment: We agree that both our first and second revisions were very much centered on the debate of the correct identification of the trigeminal nucleus and that our ms did not evolve as much in other regards. This being said we agree with Referee 2 that we needed to have this debate. We also think we advanced important novel data in this context (the delineation of elephant olivo-cerebellar tract through the peripherin-antibody).

      Changes: Our revised Figure 2. 

      The peripherin staining adds another level of argument to the authors having identified the trigeminal brainstem instead of the inferior olive, if differential expression of peripherin is strong enough to distinguish one structure from the other.

      Comment: We think we showed too little peripherin-antibody staining in our previous revision. We have now addressed this problem.

      Changes: Our revised Figure 2, i.e. the delineation of elephant olivo-cerebellar tract through the peripherin-antibody).

      There are some minor corrections to be made with the addition of Fig. 2., including renumbering the figures in the manuscript (e.g., 406, 521). 

      I continue to appreciate this novel investigation of the elephant brainstem and find it an interesting and thorough study, with the use of classical and modern neuroanatomical methods.

      Comment: We are thankful for this positive assessment.

      Reviewer #2 (Recommendations For The Authors):

      I do realise the authors are very unhappy with me and the reviews I have submitted. I do apologise if feelings have been hurt, and I do understand the authors put in a lot of hard work and thought to develop what they have; however, it is unfortunate that the work and thoughts are not correct. Science is about the search for the truth and sometimes we get it wrong. This is part of the scientific process and why most journals adhere to strict review processes of scientific manuscripts. As I said previously, the authors can use their data to write a paper describing and quantifying Golgi staining of neurons in the principal olivary nucleus of the elephant that should be published in a specialised journal and contextualised in terms of the motor control of the trunk and the large cerebellum of the elephant. 

      Comment: We appreciate the referee’s kind words. Also, no hard feelings from our side, this is just a scientific debate. In our experience, neuroanatomical debates are resolved by evidence and we note that we provide evidence strengthening our identification of the trigeminal nucleus and inferior olive. As far as we can tell from this effort and the substantial evidence accumulated, the referee is wrong.

      Reviewer #4 (Recommendations For The Authors):

      As a new reviewer, I have benefited from reading the previous reviews and Author response, even while having several new comments to add. 

      (1) The identification of the inferior olive and trigeminal nuclei is obviously center stage. An enlargement of the trigeminal nuclei is not necessarily problematic, given the published reports on the dramatic enlargement of the trigeminal nerve (Purkart et al., 2022). At issue is the conspicuous relocation of the trigeminal nuclei that is being promoted by Reveyaz et al. Conspicuous rearrangements are not uncommon; for example, primary sensory cortical fields in different species (fig. 1 in H.H.A. Oelschlager for dolphins; S. De Vreese et al. (2023) for cetaceans, L. Krubitzer on various species, in the context of evolution). The difficult point here concerns what looks like a rather conspicuous gross anatomical rearrangement, in BRAINSTEM - the assumption being that the brainstem bauplan is going to be specifically conservative and refractory to gross anatomical rearrangement. 

      Comment: We agree with the referee that the brainstem rearrangements are unexpected. We also think that the correct identification of nuclei needs to be at the center of our revision efforts.

      Change: Our revision provided further evidence (delineation of the olivo-cerebellar tract, characterization of the trigeminal nerve entry) about the identity of the nuclei we studied.

      Why would a major nucleus shift to such a different location? and how? Can ex vivo DTI provide further support of the correct identification? Is there other "disruption" in the brainstem? What occupies the traditional position of the trigeminal nuclei? An atlas-equivalent coronal view of the entire brainstem would be informative. The Authors have assembled multiple criteria to support their argument that the ventral "bumps" are in fact a translocated trigeminal principal nucleus: enhanced CO staining, enhanced vascularization, enhanced myelination (via Golgi stains and tomography), very scant labeling for a climbing fiber specific antibody ( anti-peripherin), vs. dense staining of this in the alternative structure that they identify as IO; and a high density of glia. Admittedly, this should be sufficient, but the proposed translocation (in the BRAINSTEM) is sufficiently startling that this is arguably NOT sufficient. <br /> The terminology of "putative" is helpful, but a more cogent presentation of the results and more careful discussion might succeed in winning over at least some of a skeptical readership. 

      Comment: We do not know, what led to the elephant brainstem rearrangements we propose. If the trigeminal nuclei had expanded isometrically in elephants from the ancestral pattern, one would have expected a brain with big lateral bumps, not the elephant brain with its big ventromedial bumps. We note, however, that very likely the expansion of the elephant trigeminal nuclei did not occur isometrically. Instead, the neural representation of the elephant nose expanded dramatically and in rodents the nose is represented ventromedially in the brainstem face representation. Thus, we propose a ‘ventromedial outgrowth model’ according to which the elephant ventromedial trigeminal bumps result from a ventromedially direct outgrowth of the ancestral ventromedial nose representation.

      We advanced substantially more evidence to support our partitioning scheme, including the delineation of the olivo-cerebellar tract based on peripherin-reactivity. We also identified problems in previous partitioning schemes, such as the claim that the trigeminal nerve continues into the ~4x smaller olivocerebellar tract (Referee-Figure 1C, D); we think such a flow of fibers, (which is also at odds with peripherin-antibody-reactivity and the appearance of nerve and olivocerebellar tract), is highly unlikely if not physically impossible. With all that we do not think that we overstate our case in our cautiously presented ms.

      Change: We added evidence on the identification of elephant trigeminal nuclei and inferior olive.

      (2) Role of myelin. While the photos of myelin are convincing, it would be nice to have further documentation. Gallyas? Would antibodies to MBP work? What is the myelin distribution in the "standard" trigeminal nuclei (human? macaque or chimpanzee?). What are alternative sources of the bundles? Regardless, I think it would be beneficial to de-emphasize this point about the role of myelin in demarcating compartments. <br /> I would in fact suggest an alternative (more neutral) title that might highlight instead the isomorphic feature; for example, "An isomorphic representation of Trunk folds in the Elephant Trigeminal Nucleus." The present title stresses myelin, but figure 1 already focuses on CO. Additionally, the folds are actually mentioned almost in passing until later in the manuscript. I recommend a short section on these at the beginning of the Results to serve as a useful framework.

      Here I'm inclined to agree with the Reviewer, that the Authors' contention that the myelin stipes serve PRIMARILY to separate trunk-fold domains is not particularly compelling and arguably a distraction. The point can be made, but perhaps with less emphasis. After all, the fact that myelin has multiple roles is well-established, even if frequently overlooked. In addition, the Authors might make better use of an extensive relevant literature related to myelin as a compartmental marker; for example, results and discussion in D. Haenelt....N. Weiskopf (eLife, 2023), among others. Another example is the heavily myelinated stria of Gennari in primate visual cortex, consisting of intrinsic pyramidal cell axons, but where the role of the myelination has still not been elucidated. 

      Comment: (1) Documentation of myelin. We note that we show further identification of myelinated fibers by the fluorescent dye fluomyelin in Figure 4B. We also performed additional myelin stains as the gold-myelin stain after the protocol of Schmued (Referee-Figure 2). In the end, nothing worked quite as well to visualize myelin-stripes as the bright-field images shown in Figure 4A and it is only the images that allowed us to match myelin-stripes to trunk folds. Hence, we focus our presentation on these images.

      (2) Title: We get why the referee envisions an alternative title. This being said, we would like to stick with our current title, because we feel it highlights the major novelty we discovered.

      (3) We agree with many of the other comments of the referee on myelin phenomenology. We missed the Haenelt reference pointed out by the referee and think it is highly relevant to our paper

      Change: 1. Referee Figure. 2. Inclusion of the Haenelt-reference.

      Author response image 2.

      Myelin stripes of the elephant trunk module visualized by Gold-chloride staining according to Schmued

      A, Low magnification micrograph of the trunk module of African elephant Indra stained with AuCl according to Schmued. The putative finger is to the left, proximal is to the right. Myelin stripes can easily be recognized. The white box indicates the area shown in B.

      B, high magnification micrograph of two myelin stripes. Individual gold-stained (black) axons organized in myelin stripes can be recognized.

      Schmued, L. C. (1990). A rapid, sensitive histochemical stain for myelin in frozen brain sections. Journal of Histochemistry & Cytochemistry38(5), 717-720.

      Are the "bumps" in any way "analogous" to the "brain warts" seen in entorhinal areas of some human brains (G. W. van Hoesen and A. Solodkin (1993)? 

      Comment: We think this is a similar phenomenon.

      Change: We included the Hoesen and A. Solodkin (1993) reference in our discussion.

      At least slightly more background (ie, a separate section or, if necessary, supplement) would be helpful, going into more detail on the several subdivisions of the ION and if these undergo major alterations in the elephant.

      Comment: The strength of the paper is the detailed delineation of the trunk module, based on myelin stripes and isomorphism. We don’t think we have strong evidence on ION subdivisions, because it appears the trigeminal tract cannot be easily traced in elephants. Accordingly, we find it difficult to add information here.

      Change: None.

      Is there evidence from the literature of other conspicuous gross anatomical translocations, in any species, especially in subcortical regions? 

      Comment: The best example that comes to mind is the star-nosed mole brainstem. There is a beautiful paper comparing the star-nosed mole brainstem to the normal mole brainstem (Catania et al 2011). The principal trigeminal nucleus in the star-nosed mole is far more rostral and also more medial than in the mole; still, such rearrangements are minor compared to what we propose in elephants.

      Catania, Kenneth C., Duncan B. Leitch, and Danielle Gauthier. "A star in the brainstem reveals the first step of cortical magnification." PloS one 6.7 (2011): e22406.

      Change: None.

      (3) A major point concerns the isomorphism between the putative trigeminal nuclei and the trunk specialization. I think this can be much better presented, at least with more discussion and other examples. The Authors mention about the rodent "barrels," but it seemed strange to me that they do not refer to their own results in pig (C. Ritter et al., 2023) nor the work from Ken Catania, 2002 (star-nosed mole; "fingerprints in the brain") or other that might be appropriate. I concur with the Reviewer that there should be more comparative data. 

      Comment: We agree.

      Change: We added a discussion of other isomorphisms including the the star-nosed mole to our paper.

      (4) Textual organization could be improved. 

      The Abstract all-important Introduction is a longish, semi "run-on" paragraph. At a minimum this should be broken up. The last paragraph of the Introduction puts forth five issues, but these are only loosely followed in the Results section. I think clarity and good organization is of the upmost importance in this manuscript. I recommend that the Authors begin the Results with a section on the trunk folds (currently figure 5, and discussion), continue with the several points related to the identification of the trigeminal nuclei, and continue with a parallel description of ION with more parallel data on the putative trigeminal and IO structures (currently referee Table 1, but incorporate into the text and add higher magnification of nucleus-specific cell types in the IO and trigeminal nuclei). Relevant comparative data should be included in the Discussion.

      Comment: 1. We agree with the referee that our abstract needed to be revised. 2. We also think that our ms was heavily altered by the insertion of the new Figure 2, which complemented Figure 1 from our first submission and is concerned with the identification of the inferior olive. From a standpoint of textual flow such changes were not ideal, but the revisions massively added to the certainty with which we identify the trigeminal nuclei. Thus, although we are not as content as we were with the flow, we think the ms advanced in the revision process and we would like to keep the Figure sequence as is. 3. We already noted above that we included additional comparative evidence.

      Change: 1. We revised our abstract. 2. We added comparative evidence.

      Reviewer #5 (Recommendations For The Authors): 

      The data is invaluable and provides insights into some of the largest mammals on the planet. 

      Comment: We are incredibly thankful for this positive assessment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review: 

      This study used ATAC-Seq to characterize chromatin accessibility during stages of GABAergic neuron development in induced pluripotent stem cells (iPSCs) derived from both Dravet Syndrome (DS) patients and healthy donors. The authors report accelerated GABAergic maturation to a point, followed by further differentiation into a perturbed chromatin profile, in the cells from patients. In a preliminary analysis, valproic acid, an anti-seizure medication commonly used in patients with DS, increased open chromatin in both patient and control iPSCs in a nonspecific manner, and to different degrees in cultures derived from different patients. These findings provide new information about DS-associated changes in chromatin, and provide further evidence for developmental abnormalities in interneurons with DS. 

      Strengths:

      This is a novel study that aims to investigate the epigenetic changes that occur in a sodium channel model of epilepsy; these changes are often ignored but may be an interesting area for future therapeutics. In general, the flow of the paper is good, and the figures are well-designed.  Reply: Thank you for your positive feedback about our work. 

      Weaknesses:

      The most substantial weakness relates to the observation that DS is often viewed as a monogenic form of epilepsy. It is directly linked to SCN1A gene haploinsufficiency (Yu et al, 2006; Ogiwara et al, 2007). The gene product is Nav1.1, the alpha subunit of voltage-gated sodium channel type I that regulates neuronal excitability. Yet, analysis was conducted at time points of GABAergic interneuron differentiation in which SCN1A is likely not expressed. The paper would be strengthened if SCN1A expression and Nav1.1 protein were examined across the experimental time course. If SCN1A is not yet expressed, this would complicate any explanation of how the observed epigenetic changes might arise. It also seems counterintuitive that the absence of a sodium channel can accelerate differentiation, when, a priori, one might expect the opposite (a 'less neuronal' signal). 

      Thanks, this is an important point!  In our revised manuscript, we have incorporated data on the expression of SCN1A at d19 and d65 of GABAergic development in both the control and patient groups. We first retrieved data from our previous RNA-Seq analysis, showing SCN1A gene expression in our cells at both d19 and d65. We have now updated our text on the SCN1A gene expression in the revised manuscript (Revised Supplementary Figure 1A, revised text Line 108-109). Second, we confirmed the dynamics of SCN1A expression by real-time quantitative RT/PCR analysis at four time-pionts of GABAergic development (d0, d19, d35 and d65). Notably, expression of SCN1A was detected by qRT-PCR from d19 and the expression increased with differentiation. We have now included this information in the revised manuscript (Revised Supplementary Figure 1B, revised text Line 112). 

      Related to this, another important limitation of the study is that the controls are cells derived from healthy individuals and not from isogenic lines. The usage of isogenic lines is extremely relevant for every study in which iPSC-derived somatic cells are used to model a disease, but specifically in diseases like DS, in which the genetic background has an ascertained impact on disease phenotype (Cetica et al, 2017 and others). This serious limitation should be considered.

      Yes, we fully agree that isogenic and edited patient-derived iPSC would have been the ideal controls. At an early stage we therefore invested considerable time and efforts in order to generate isogenic lines from patientderived iPSC. However, editing of the SCN1A variants in patient-derived iPSC turned out unsuccessful after several trials and modifications so we finally turned to iPSC from healthy donors. This is now discussed together with other limitations of our study in the revised manuscript (end of discussion section, lines 499-506).

      In addition, the authors should provide data on variability across cell lines and differentiations to help convince the reader that the results can be attributed to genetic defects, rather than variability across individuals. 

      This is a valuable point. In the revised manuscript, we have now added plots and IF staining from individual samples to give the readers a complete picture on how they are distributed (Revised Supplementary Figure 1C, Revised Supplementary Figure 2, and Revised Supplementary Figure 4).

      In the revised manuscript, we incorporated an explanation on the strategy used to compare the two groups (cases vs. controls) in more detail. In our analysis, we first compared the dynamic changes of chromatin accessibility cell line by cell line across differentiation. We then extracted the common changes from different cell lines at each time point (Revised text line 152-155, line 226-228). Using this strategy, we extracted the common changes confined to the control and patient groups, respectively. With this approach we avoid to capture the variability across individuals.

      Additionally, the authors acknowledge the variability of the differentiations and cell lines, which is commendable, and they attribute this to "possibly reflecting cell line specific and endogenous differences reported previously", but could also have to do with cell death. This is a large confounding factor for ATAC-seq. Certainly, Sup Fig 1C shows lower FrIP scores, consistent with cell death, and there seems to be a lot of death in the representative images. Moreover, the iGABA neurons are very difficult to keep alive, especially to 65 days, without co-culturing with glia and/or glutamatergic neurons. The authors should comment on how much these factors may have influenced their results. 

      With this point in mind, we re-examined QC of our ATAC-Seq across all samples: As shown in revised

      Supplementary Figure 2C and Supplementary Figure 4C, our cutoff for FRiP is 15%, and all of samples have an FrIP of more than 15%. At the later time points (d35 or d65), we did not observe a FRiP <15%. We therefore feel confident that the quality of ATAC-Seq is good enough for downstream analysis and data interpretation.  

      Regarding the differentiation protocol, we are following a directed protocol of iPSC towards interneurons. The protocol is described in detail by Maroof et al (reference 34) and slightly modified in our lab (described in reference 13). With our modified protocol, GABAergic cells are viable beyond day 65 without the need of co-cultures with astrocyte or microglia. This is also reflected by the electrophysiological activity of interneurons at d65 and at later time points (reference 13). Additionally, our ambition was to obtain a homogeneous cell population for further analysis. Adding other cell types to the cultures would have interfered with downstream processes and a need for cell sorting. Using our protocol, we obtain viable GABA interneurons after up to 100 days in culture. To assess the viability of our cells at the point of sampling (other than by morphological assessment), we used Trypan blue staining and an automated cell counter. Only samples with a viability >90% were processed for ATAC seq. which is a commonly used cut-off for cell viability. We have now modified the method section in the revised version to describe the GABAergic differentiation and sampling (line 519-529).

      Finally, changes in gene expression are only inferred, as no RNA levels were measured. If RNA-seq was not possible it would have been good to see at least some of the key genes/findings corroborated with RNA/protein levels vs chromatin accessibility alone, particularly given that these molecular readouts do not always correlate. 

      In our revised manuscript, we include our recently published RNA-seq performed at d19 and d65. We also correlated the RNAseq and ATACseq data obtained from the same samples.  The Pearson correlations between gene expression and chromatin accessibility were within the range 0.49-0.57 (Revised Supplementary Figure 2G, Revised supplementary Figure 4G), which is acceptable according to standard criteria. The results confirmed that the quality of ATAC-Seq is good enough for analysis of expression levels and chromatin openness in key genes. We also added gene expression levels from RNA-seq (d19 and d65) in our revised manuscript (Revised Figure 1G, Revised Figure 2G). Finally, we performed qRT-PCR analysis of key genes in each cluster and the results are now included in the revised version (Revised Supplementary Figure 3E, Revised Supplementary Figure 5E)

      Additional Points:

      (1) Representative images for cell-identity markers for only D65 are shown, and not D0, D19, and D35 though it is stated in the text that this was performed. At a minimum, these representative images should be shown for all lines. 

      As suggested, we have now added images for cell identity markers of all iPSC lines in the revised version (Revised Supplementary Figure 1C).

      (2) What QC was performed on iPSC lines, i.e. karyotype/CNV analysis and confirmation of genotypes?

      All iPSC lines used in this study have been fully characterized according to standard and state-of-the art procedures: Expression of pluripotency and stemness genes has been shown by immunostaining, flow cytometry and scorecard analysis; integrity of the genome has been assessed by karyotyping using g-banding; differentiation capacity was characterized using an embryoid body assay in combination with scorecard analysis; and genotypes were verified by Sanger sequencing. Please, see the following publications for full datasets: Schuster et all, Neurobiol Dis 2019, Schuster et al Stem Cell Res 2019, Sobol et al Stem Cells and Development 2015. In our lab, the integrity of iPSC lines are routinely verified using flow cytometry (expression for TRA-1-60 and SSEA4), immunostaining (expression of NANOG, SOX2 and OCT4), Sanger sequencing (targeting variants in SCN1A gene), cell morphology analysis and analysis of mycoplasma by MycoAlert® (Lonza).

      (3) Were all experiments performed on a single differentiation? Or multiples? Were the differentiations performed with the same type? If not, was batch considered in the analysis? 

      Thank you for raising this question. The text Material and Methods has been modified as follows, to better describe the differentiation and sampling procedure:

      “GABAergic interneuron differentiation from iPSCs was performed as previously described (reference 13). The protocol utilizes DUAL SMAD inhibition to induce neurogenesis towards neural stem cells for 10 days, followed by patterning with high levels of sonic hedgehog for nine days towards cortically fated neuronal progenitor cells (NPC) and subsequent maturation for 46 days, i.e. a total of 65 days (Figure 1A). Neuronal cells at day 65 and onwards are healthy and viable as judged by morphological assessment by light microscopy. Differentiation was performed at least 3 times per cell line.  

      Cell cultures were sampled at days 0 (D0), D19, D35 and D65, respectively, by harvesting cells with TryplE and centrifugation (300 x g, 3 min). Harvested cells were counted and assessed for viability using trypan blue staining and an automated EVE cell counter (Nano Entek). Samples with a viability of >90% were chosen for ATAC-Seq library preparation (see below).”.  

      I also assume that technical replicates were merged, and then all three biological replicates were kept for each analysis and outliers were not removed, e.g. Control_D19_8F seems like an example of an outlier. 

      This is a valuable point. We agree on that there is variability across three health donors and patients, respevtively, but the quality of ATAC-Seq is good after multiple assessment of QC (Revised Supplementary Figure 2B-D). The color code in Supplementary Figure 1C may be mis-leading as the Pearsson correlation of all samples was displayed. Overall, the correlation from all ATAC-seq among replicates are over 0.8. At the same time, we observed that samples at d0 are clustered together, but not at the later time points. We interpret this as related to the cell-line specific plasticity of chromatin dynamic during differentiation. The observation agrees with our results from PCA (Revised Supplementary Figure 2F).  

      (4) In Figure 1C, it is intriguing that the ATACseq signal gets stronger in imN. One might expect it to be strongest in the iPSCs which are undifferentiated and have the highest levels of open chromatin. Is this a function of sequencing depth, or are all the Y-axes normalized across all time points? 

      This is another valuable point. Figure 1C present the average chromatin openness for clusters specific regions- not of chromatin openness from the entire genome, which is a reason for why the chromatin openness at

      D35 is higher than at other time-points. The genome-wide chromatin openness is presented in revised

      Supplementary Figure 2D and we have now updated the figure legend to avoid any potential misunderstanding. 

      The sequencing depth for each sample is extracted in a similar range. To give the readers a complete picture, we also present the depth of sequencing reads for each sample (Revised Supplementary Figure 2A and Revised Supplementary Figure 4A). The Y-axes of genome browser tracks were normalized, and we added the normalized value in the figures. 

      (5) In Figure 1F, are these all enriched terms, or were they prioritized somehow? 

      Yes, the enriched terms are prioritized based on biological meanings, and we have now clarified this in the updated legend of the manuscript. In addition, all enriched terms are now included in revised Supplementary Table 2 and Supplementary Table 4. 

      (6) In Figure 1G (also the same plots in Fig 2/3), are all these images normalized i.e. there is no scale bar for each track, and do they represent and aggregate BAM/bigwig?

      Yes, the genome browser tracks were normalized and we have now revised the figures by adding scale bars.

      It would be good to show in supplement the variability across cell lines/diffs - particularly given the variability in the heatmap/PCA - and demonstrate the rigor/reproducibility of these results. This comment applies to all these plots across the 3 figures, particularly as in some instances the samples appear to cluster by individual first and then time point (Sup Fig 3B). 

      Thanks. We have now revised the figure with plots showing individual samples. 

      How confident are the authors that these effects are driven by genotype and not a single cell line? In the Fig 3D representation of NANOG, it is very difficult to see any difference between patient and control. 

      In Figure 3D, we showed common chromatin dynamics in the control and patient groups. To avoid any misunderstanding, we have now updated our legend in the revised manuscript. 

      (7) For the changes in occupancy annotation (UTR/exon/intron etc), are these differences still significant after correcting for variability from cell line to cell line at each time point? I.e. rather than average across all three samples, what is the range?  Reply: Revised accordingly. 

      (8) The VPA timepoint is not well-justified. Given that VPA would be administered in patients with fully mature inhibitory neurons, it is difficult to determine the biological relevance. I appreciate that this is a limitation of the model, but this should at least be addressed in the manuscript. 

      We agree on that our model system of GABAergic interneuron development has limitations and that cells may not fully recapitulate the development and physiology in vivo. Obvious factors to consider in our system are the directed protocol to enrich for GABAergic interneurons and the differentiation time-line restricted to 65d. This is now discussed (lines 499-506).

      Recommendations for the authors:

      (1) The term 'mutation' has been replaced with the term ' pathogenic variant' or likely pathogenic variant depending on the context, please see PMID: 25741868 

      Thank you for pointing this out. We have replaced all instances of “mutation” with “pathogenic variant” throughout the manuscript.

      (2) It is unclear what the nomenclature for sample labelling is in Supplementary Figure 1, e.g. 7C, 8F, 1B.  

      We apologize for this confusion. There are cell lines names. We labeled all data and images according to cell line name, i.e. control lines: Ctl1B, Ctl7C and Ctl8F; patient lines: DD1C, DD4A, DD5A. To avoid any potential confusion, we have added a note in the revised legend of Supplementary Figure 1B.

      (3) Can the authors confirm that the Deseq2 FDR values are Benjamini-Hochberg procedure corrected per default settings? If so, this should ideally be added to methods or legend for clarity 

      Yes, default settings were used in Deseq2 FDR values, which is added in the method part of revised manuscript. 

      (4) While it makes sense that the authors present the data in the order of Figure 1, and Figure 2, this actually makes it quite difficult to compare the two datasets, especially for the functional enrichment in the "F" figures. It may be helpful to consider re-organizing the figure order. For instance, for the long-term potentiation signal in the DS-iPSCs, what does this mean in terms of biological relevance? Or maybe Figure 2 needs to be supplementary given that Figure 3 is a more direct comparison.  

      Thank you for the suggestions. We attempted to reorganize during our revision. We still believe it is easier for the audience to grasp the main message if we organize it according to our current workflow—first presenting an individual differential landscape for controls and patients, and then comparing the common and unique aspects among them.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, entitled " Merging Mul-OMICs with Proteome Integral Solubility Alteration Unveils Antibiotic Mode of Acon", Dr. Maity and colleagues aim to elucidate the mechanisms of action of antibiotics through combined approaches of omics and the PISA tool to discover new targets of five drugs developed against Helicobacter pylori.

      Strengths:

      Using transcriptomics, proteomic analysis, protein stability (PISA), and integrative analysis, Dr. Maity and colleagues have identified pathways targeted by five compounds initially discovered as inhibitors against H. pylori flavodoxin. This study underscores the necessity of a global approach to comprehensively understanding the mechanisms of drug action. The experiments conducted in this paper are well-designed and the obtained results support the authors' conclusions.

      Weaknesses:

      This manuscript describes several interesting findings. A few points listed below require further clarification:

      (1) Compounds IVk exhibits markedly different behavior compared to the other compounds. The authors are encouraged to discuss these findings in the context of existing literature or chemical principles.

      This is a good point. We have added the following paragraph (Page No-13).

      “In several of our studies, compound IVk, which has a higher MIC, exhibits markedly different behavior. This difference in behavior may stem from different sources, including intercellular availability, inactivation inside the cell, or loss of target specificity. Multiple studies have previously demonstrated that there is only a 30% chance for a structurally similar compound to have similar biological activity32.”

      (2) The incubation me for treating H. pylori with the drugs was set at 4 hours for transcriptomic and proteomic analyses, compared to 20 min for PISA analysis. The authors need to explain the reason for these differences in treatment duration.

      This is now explained in Pages 17 and 19, where the following paragraphs have been included

      “The incubation time for transcriptomics and proteomics assays was determined based on the Time-Kill Curves assay (Fig. 6(A)). The 4-hour time point shows a significant amount of cell death compared to the control population.”

      “The target deconvolution method aims to evaluate the initial interaction with intracellular proteins. We selected a 20-minute time point based on intracellular ROS generation (not shown). It is a well-reported phenomenon that bactericidal drugs induce early production of ROS.”

      (3) The PISA method facilitates the identification of proteins stabilized by drug treatment. DnaJ and Trigger factor (g), well-known molecular chaperones, prevent protein aggregation under stress. Their enrichment in the soluble fraction is expected and does not necessarily indicate direct stabilization by the drugs. The possibility that their stabilization results from binding to other proteins destabilized by the drugs should be considered. To prevent any misunderstanding, the authors should clarify that their methodology does not solely identify direct targets. Instead, the combination of their findings sheds light on various pathways affected by the treatment.

      This is also a very valuable observation. We now clearly state that in new paragraphs at Pages 8 and 13

      Another target shared among several compounds is the chaperone protein trigger factor (Tig), which plays a crucial role in facilitating proper protein folding and is indispensable for the survival of bacterial cells. The solubility of this protein has been altered by all the compounds except IVk (Fig. 2(I-J)) in a concentration-dependent manner (Fig. S4(B, D, and E)). The possibility of Tig interacting with other proteins destabilized by the drug, along with the influence of the heat gradient during the PISA assay, may introduce potential noise in the data. Further investigation is required to confirm the interaction of the drug with Tig.

      “The module “black” associated with this compound contains Tig, which is involved in facilitating proper protein folding, as a target, and it down-regulates multiple proteins associated closely with S12 ribosomal protein of the 30S subunit (Fig. S9(D)) indicating its involvement in stabilization of ribosomal protein.”

      (4) At the end of the manuscript, the authors conclude that four compounds "strongly interact with CagA". However, detailed molecule/protein interaction studies are necessary to definitively support this claim. The authors should exercise caution in their statement. As the authors mentioned, additional research (not mandated in the scope of this current paper) is necessary to determine the drug's binding affinity to the proposed targets.

      We have modified the sentence (Page -15) to say:

      “This study identifies four out of our five compounds that induce significant change in the solubility of CagA, the major virulence factor of H. pylori.”

      (5) The authors should clarify the PISA-Express approach over standard PISA. A detailed explanation of the differences between both methods in the main text is important.

      This was already explained in Page 5 (no changes have been made)

      Reviewer #2 (Public Review):

      Summary:

      This work has an important and ambitious goal: understanding the effects of drugs, in this case antimicrobial molecules, from a holistic perspective. This means that the effect of drugs on a group of genes and whole metabolic pathways is unveiled, rather than its immediate effect on a protein target only. To achieve this goal the authors successfully implement the PISA-Express method (Protein Integral Solubility Alteration), using combined transcriptomics, proteomics, and drug-induced changes in protein stability to retrieve a large number of genes and proteins affected by the used compounds. The compounds used in the study (compound IVa, IVb, IVj, and IVk) were all derived from the precursors compound IV, they are effective against Helicobacter pylori, and their mode of action on clusters of genes and proteins has been compared to the one of the known pylori drug metronidazole (MNZ). Due to this comparison, and confirmed by the diversity of responses induced by these very similar compounds, it can be understood that the approach used is reliable and very informative. Notably, although all compound IV derivatives were designed to target pylori Flavodoxin (Fld), only one showed a statically significant shift of Fld solubility (compound IVj, FIG S11). For most other compounds, instead, the involvement of other possible targets affecting diverse metabolic pathways was also observed, notably concerning a series of genes with other important functions: CagA (virulence factor), FtsY/FtsA (cell division), AtpD (ATP-synthase complex), the essential GTPase ObgE, Tig (protein export), as well as other proteins involved in ribosomal synthesis, chemotaxis/motility and DNA replication/repairs. Finally, for all tested molecules, in vivo functional data have been collected that parallel the omics predictions, comforting them and showing that compound IV derivatives differently affect cellular generation of reactive oxygen species (ROS), oxygen consumption rates (OCR), DNA damage, and ATP synthesis.

      Strengths:

      The approach used is very potent in retrieving the effects of chemically active molecules (in this case antimicrobial ones) on whole cells, evidencing protein and gene networks that are involved in cell sensitivity to the studied molecules. The choice of these compounds against H. pylori is perfect, showcasing how different the real biological response is, compared to the hypothetical one. In fact, although all molecules were retrieved based on their activity on Fld, the authors unambiguously show that large unexpected gene clusters may, and in fact are, affected by these compounds, and each of them in different manners.

      Impact:

      The present work is the first report relying on PISA-Express performed on living bacterial cells. Because of its findings, this work will certainly have a high impact on the way we design research to develop effective drugs, allowing us to understand the fine effects of a drug on gene clusters, drive molecule design towards specific metabolic pathways, and eventually better plan the combination of multiple active molecules for drug formulation. Beyond this, however, we expect this article to impact other related and unrelated fields of research as well. The same holistic approaches might also allow gaining deep, and sometimes unexpected, insight into the cellular targets involved in drug side effects, drug resistance, toxicity, and cellular adaptation, in fields beyond the medicinal one, such as cellular biology and environmental studies on pollutants.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please modify these few concerns:

      -  It is unclear from the introduction and discussion whether conventional transcriptomic and proteomic analyses have previously been conducted on the compounds examined in this study. If only targeted studies have been performed please clarify this further.

      To make it more clear, we have added the following paragraph in Page 5:

      “Our investigation into understanding the mode of action of nitro-benzoxadiazole compounds commenced with a comparison of the conventional transcriptional and translational changes induced by these compounds, the vehicle control (DMSO), and the commercially used drug MNZ. RNA sequencing (RNA-seq) and expressional proteomics were employed to identify transcriptional and translational changes, respectively.”

      -  The decision to monitor the oxygen consumption rate (OCR) is based on the hypothesis that the drugs would impact flavodoxins function. Could the authors cite specific studies that suggest a reduction in flavodoxin leads to decreased OCR that can be measured?

      The reviewer is correct to say that we have done this study based on our hypothesis that a reduction in flavodoxin may lead to decreased OCR.  To our knowledge, there is no previous studies indicating that so we now clearly state (Page 14) that it is our hypothesis.

      “On the other hand, given that these drugs indicated involvement of multiple factors from the electron transport chain including flavodoxin and we observed significant drop in the ATP production rate (Fig. 6(D)) associated to compounds IV and IVj, we have investigated the changes in oxygen consumption rate (OCR) as we hypothesize that a reduction in soluble flavodoxin could lead to decreased OCR”

      -  Increase font size in some figures and supplemental materials for clarity.

      We acknowledge the reviewer's comment and have addressed it to the best possible extent in the figures.

      -  Correct figure references throughout the text (example of mistake p4, Fig S1D, p6 S1C).

      We have corrected the figure references.

      -  Check spelling errors, for example, Figure S1B: "library preparation".

      We have revised the figures and corrected spelling errors.

      -  Ensure H. pylori is in italics.

      Done!

      -  Figure S4: Replace (D) by (E).

      Done! Thank you.

      -  Page 7: Check the sentence: "...RpleE, InfC) and F Furthermore, we..." .

      Corrected!  

      “The 20 common essential targets are mostly associated with cell division (for example, FtsZ), small subunit ribosomal proteins (RspC, RspE, RspL, RplE, InfC). Furthermore, we identified a few unique changes for compound IV (DnaN, involved in DNA tethering and processivity of DNA polymerases, and C694_06445, which could be a functional equivalent of delta subunit of DNA polymerase III).”

      -  Page 9: Please modify the name of one compound "Compounds IV, IVj (and not IVk) and MnZ downregulate...".

      We have observed that both reviewers mentioned this point and we revisited the data, as suggested by Fig S8(B), that compounds IV, IVk, and MNZ cluster together and downregulate the genes associated with this pathway. Based on this, we have not changed anything in the text.  

      -  Figure S9: please clarify symbols (triangles and others) in the Figure legend.

      Done!

      -  Page 9: Is it the Figure S9B you are referring to? Talking about proteomics?

      Sorry, we have not understood the above comment.

      Reviewer #2 (Recommendations For The Authors):

      All figures are printed as one per page. In this format, almost all pictures suffer a severe problem with dimensions. Notably graph axes and axis values, subtitles, and legends within the pictures are too small, although the graphical part is almost always appropriate. Negative example (higher fonts are needed): Figure 1. Positive example (font ok): Figure 2A or Figure 3 right panels.

      We have carefully revised our figures to address the issues you mentioned, ensuring that elements are visible when printed one per page. In Fig 1: We have increased the font sizes of the graph axes, axis values, subtitles, and legends to improve readability. Additionally, we have color-matched different Gene Ontology (GO) terms for better rideability. In Fig 2: To enhance clarity, we have resized the figure by removing the top 10 protein list, now presented in a separate table. This ensures that the figure's main content remains prominent.  These modifications have been made across figures to maintain consistency and readability.

      For all figures, particularly for non-experts, not only a list of what is found in the picture should be provided, but also a minimal, simplified key of interpretation (of what is to be noticed). Particularly relevant for scatter plots.

      We have modified the legends to provide simplified key interpretation for the scatter plots. 

      In general for most analyses I see the involvement of FtsA, whereas most discussions concern FtsY and FtsZ. Maybe this point should be clarified. For example: i) FtsZ is quoted in the Second "Results" paragraph (page 6), but we can't find this gene in Figure 2, nor in the corresponding table (Figure 2A); ii) FtsY downregulation is quoted in the Fifth "Results" paragraph (page 9), but we can't find this gene in Figure 5, 9S or 10S.

      We are not entirely sure if we have understood the reviewer's comment correctly, as we did not mention FtsY in our discussion section. In the discussion section, we have focused on the involvement of FtsZ and FtsA with some of our compounds. We decided to discuss them together because FtsZ is the primary component that is recruited to the membrane by the actin-related protein FtsA, while the role of FtsY remains highly debated.

      Figure 1: same colour for the same GO: term in different panels should be used.

      Done!

      Figure 4: please specify (being it essential throughout the whole paper) that the group colouring only refers to Figure 4A, lower bar.

      Done!

      Figure 5, S9, and S10: having the combination of analysed sets (brown / IV , magenta / IVb, etc....) as a panel subtle is almost a necessity, to avoid constant page turning. I did rewrite all of them by hand to be able to follow the main text story.

      Done!

      What are the triangles? (this is not written anywhere).

      We have now explained this in the legends of Fig5.

      Figures S9 and S10 are too crowded (please refer to Figure 5 for a good format/size).

      For supplementary figures S9 and S10 we prefer to keep the gene names, but in order to make them more legible we have now added subtitles to each panel.

      Second and third "Results" paragraph. Explicitly saying that the Second is only focused on TOP 10 hits, at the beginning of the paragraph (while the third on essential genes) would help enormously the non-specialist in orienting among the different sections.

      On page 7, we have revised the text to indicate that the paragraph is only focused on the top 10 hits. Additionally, we have included a table of top 10 hits for better clarity and accessibility. 

      Page 6: the following sentence should be in the introduction, to stress the novelty of the work: "This is the first me PISA assay, in the form of PISA-Express, has been successfully performed in living bacterial cells, with protocols adapted and modified from previous PISA studies in mammalian cells".

      Page- 2 

      We agree this is an important point. However, having we stated it in both the abstract and in the PISA section in the results we prefer not to state it once more in the Introduction.

      (no changes made)

      I couldn't find any reference to Figure S3 in the text.

      Included! (P 9)

      "Compounds IV, IVk, and MNZ downregulate the genes associated with this pathway (Fig. 4(B) & S8(B))": it seems to me that it is IVj rather than IVk to downregulate. Please check carefully.

      We have observed that both reviewers mentioned this point and we revisited the data, as suggested by Fig S8(B), that compounds IV, IVk, and MNZ cluster together and downregulate the genes associated with this pathway. Based on this, we have not changed anything in the text.  

      Page 12: of the pre-defined target like flavodoxin => of the pre-defined target flavodoxin.

      Thanks! We have removed “like” from the sentence.

      Metronidazol (=MNZ) only appears on page 13 (MNZ already on page 8).

      Corrected!  The correspondence is now first indicated in P. 3.

      Please resolve the ambiguity metronidazol/metronidazole (main text and figures).

      We now always say “metronidazole”

      The Sixth "Results" paragraph (pages 10-11) should be developed a bit more. All Figure 6 results are summarized in 8 lines at the end of the paragraph. This doesn't bring much, particularly to a non-specialist reader. Please, for each panel, clearly explain what is to be noticed and what main conclusion(s) can be extracted.

      We have improved the description of the section. The modified part now reads:

      …This indicates that the nitro-bearing groups have a higher propensity to generate ROS. We have also observed that the genes associated with the generation of ROS are significantly overexpressed for compounds IV, IVb, IVj, and MNZ (Fig. S12(A)). As described above and depicted in Fig. S12(B), multiple DNA damage repair proteins and genes are down-regulated in the presence of compounds IV, IVb, IVj, and MNZ. Additionally, DNA PolA was found to be a major target for compound IVj. Following these results, we investigated compound-induced DNA damage using the APO BrdU TUNEL assay. All the compounds, particularly IV and IVj, caused significant DNA damage (Fig. 6(C)).

      On the other hand, given that these drugs indicated involvement of multiple factors from the electron transport chain including flavodoxin and we observed significant drop in the ATP production rate (Fig. 6(D)) associated to compounds IV and IVj, we have investigated the changes in oxygen consumption rate (OCR) as we hypothesize that a reduction in soluble flavodoxin could lead to decreased OCR.  Though the signal-to-noise ratio of these data is poor…

      and we added figure S12 for clarity.   

      In the same section I found: "Compound IV and its derivatives cause a marked increase in ROS generation when compared to the control (DMSO)" => refers to THIS work or previous work? (in the later case, please quote it).

      This data is from our current paper, as shown in Fig 6(B).

      In the same paragraph, "the signal-to-noise ratio of these data is considerable" => does it mean that you have good (high signal-to-noise) data, or that you have too high noise for precise quantification? I rather understood the later, but this sentence definitely needs to be rewritten.

      Thank you for pointing out the mistake. Your interpretation is correct. We have corrected the sentence.

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) The conclusions in the text are very broad and general but often based on a limited number of examples. It would be important that the authors hit the appropriate tone when most of the analysis (in Figure 5) is derived from n=3 events.

      We have tried to hit the correct tone here by modifying our manuscript text. In particular we have we have added a pie chart to Figure 4 (Figure 4C, that summarises data from all RBMX targets, not just the original n=3, and shows that most RBMX targets are rescued by RBMXL2).

      (2) The fractions of long/ultra-long exons actually bound by/regulated by RBMX are not clearly stated - which is in contrast to the general statement of the title (implying a global role for RBMX in proper splicing of ultra-long exons).

      (i) We have changed our title (now “An anciently diverged family of RNA binding proteins maintain correct splicing of a class of ultra-long exons through cryptic splice site repression”).

      (ii) We also include much more clear text about the fractions of long/ultralong exons bound by RBMX with the following text: 

      “…..This led us to test whether RBMX protein is preferentially associated with long exons. For this we plotted the distribution of internal exons bound and regulated by RBMX together with all internal exons expressed from HEK293 mRNA genes (Liu et al., 2017) (Figure 2 – Source Data 1). We found that RBMX controls and binds two different classes of exons: the first have comparable length to the average HEK293 exon, while the second were extremely long, exceeding 1000 bp in length (Figure 2F). We defined this second class as ‘ultra-long exons’, which represented the 18.9% of internal exons regulated by RBMX and 17.6% of the ones that contained RBMX iCLIP tags. These proportions were significantly enriched compared to the general abundance of internal ultra-long exons expressed from HEK293 cells, which was only 0.4% (Figure 2G)……”

      “…….We next wondered whether ultra-long exons regulated by RBMX (which represented 11.6% of all ultra-long internal exons from genes expressed in HEK293) had any particular feature compared to ultra-long exons that were RBMX-independent……..”

      (3) The authors should state what fraction of ultra-long exons show cryptic splicing in the RBMX siRNA that are corrected by RBMXL2 overexpression (rather than just showing the 3 events). There's some confusion about the global nature of the conclusions relative to the data displayed.

      This is a good point. We have used the RNAseq information as suggested, and included a pie chart (Figure 4C) that includes this information.

      (4) It would be helpful if the authors could identify if there are some motifs more present in ultra-long exons than others.

      Good point, we have included k-mer analysis of the ultra-long exons bound by RBMX, and also more generally ultra-long exons in the human genome, in Figure 2H and 2I. We also add the following text:

      K-mer analyses also showed that while ultra-long exons within mRNAs are rich in AT-rich sequences compared to shorter exons (Figure 2H), the ultra-long exons that are either regulated or bound by RBMX displayed enrichment of AG-rich sequences (Figure 2I), consistent with our identified RBMX-recognised sequences (Figure 2C).

      (5) The authors should evaluate if RBMX-repressed 3' splice sites have similar or low splice site scores/strengths than natural 3' splice sites.

      We have added splice site score analyses in Figure 1F and Figure 1 Supplement 1B. These show that the cryptic splice sites repressed by RBMX are not significantly different from those that are normally used. We add the following text to accompany these figure panels:

      “Furthermore, analysis of splice site strength revealed that, unlike splice sites activated by RBMX (Figure 1 – Figure supplement 1B), alternative splice sites repressed by RBMX have comparable strength to more commonly used splice sites (Figure 1F). This means that RBMX operates as a splicing repressor in human somatic cells to prevent use of ‘decoy’ splice sites that could disrupt normal patterns of gene expression.”

      (6) The section "RBMX protein-RNA interactions may insulate important splicing signals from the spliceosome." is a very preliminary look at possible mechanisms. Can you integrate the RNA Seq and CLIP datasets to generate "splicing maps" that would provide more generalized insights? In fact, where possible, it would be great to integrate the iCLIP data from the same cell types to generate RNA splicing maps (with the KD RNA-seq data)

      We have added “RNA map-type” plots to integrate iCLIP data with splicing patterns (Figure 2 Figure supplement 1D and 1E), and made corresponding changes to the text.

      Additional changes

      We also made some extra changes to respond to the further points raised by reviewers.

      (1) We have carried out gene ontology analysis of those genes that contain RBMX-regulated ultra-long exons versus all ultra-long exons (now Figure 3A, and also Figure 3- Figure supplement 1A and 1B).

      (2) We have corrected the cartoon summarising the branch point analysis (now Figure 3 – Figure Supplement 2F).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work revealed an important finding that the blood-brain barrier (BBB) functionality changes with age and is more pronounced in males. The authors applied a non-invasive, contrast-agent-free approach of MRI called diffusion-prepared arterial spin labeling (DP-pCASL) to a large cohort of healthy human volunteers. DP-pCASL works by tracking the movement of magnetically labeled water (spins) in blood as it perfuses brain tissue. It probes the molecular diffusion of water, which is sensitive to microstructural barriers, and characterizes the signal coming from fast-moving spins as blood and slow-moving spins as tissue, using different diffusion gradients (b-values). This differentiation is then used to assess the water exchange rates (kw) across the BBB, which acts as a marker for BBB functionality. The main finding of the authors is that kw decreases with age, and in some brain regions, kw decreases faster in males. The neuroprotective role of the female sex hormone, estrogen, on BBB function is discussed as one of the explanations for this finding, supported by literature. The study also shows that BBB function remains stable until the early 60s and remarkably decreases thereafter.

      Strengths:

      The two main strengths of the study are the MRI method used and the amount of data. The authors employed a contrast-agent-free MRI method called ASL, which offers the opportunity to repeat such experiments multiple times without any health risk - a significant advantage of ASL. Since ASL is an emerging field that requires further exploration and testing, a study evaluating blood-brain barrier functionality is of great importance. The authors utilized a large dataset of healthy humans, where volunteer data from various studies were combined to create a substantial pool. This strategy is effective for statistically evaluating differences in age and gender.

      Weaknesses:

      R1.0: Gender-related differences are only present in some brain regions, not in the whole brain or gray matter - which is usually the assumption unless stated otherwise. From the title, this was not clear. Including simulations could increase readers' understanding related to model fitting and the interdependence of parameters, if present. The discussion follows a clear line of argument supported by literature; however, focusing solely on AQP4 channels and missing a critical consideration of other known/proven changes in transport mechanisms through the BBB and their effects substantially weakens the discussion. 

      Thanks for your insightful feedback and suggestions. We have made the following changes to the manuscript:

      (1) The title has been modified to highlight the sex differences in specific brain regions: “Age-Related Decline in Blood-Brain Barrier Function is More Pronounced in Males than Females in Parietal and Temporal Regions.”

      (2) To study the potential impact of prolonged ATT seen in males on estimated kw, we simulated kw distribution for females by adjusting ATT by +60 ms to match males' ATT. This led to marginally higher kw values (Supplemental Figure S2), suggesting that the kw difference between males and females is not a direct result of prolonged ATT. Additionally, we have added a section titled “Data and Code Availability Statements” in the revised manuscript to indicate that we are willing to share the reconstruction toolbox with interested groups. The toolbox is a standalone MATLAB-based program (no license required) to generate kw, CBF, and ATT maps, which can run on Windows or Mac computers.

      (3) We agree with the reviewer that BBB water exchange can be facilitated by other transport mechanisms, as we mentioned in the introduction: “Water exchange across the BBB occurs at a relatively high level and is mediated by passive diffusion, active co-transport through the endothelial membrane, and facilitated diffusion through the dedicated water channel, aquaporin-4 (AQP4), at the end-feet of astrocytes.” We emphasized our findings related to AQP4 based on the technical properties of DP-pCASL, which is more sensitive to the exchange occurring across astrocyte end-feet. We also acknowledge that different techniques can be helpful to study other components of BBB water exchange, and we have added the following discussion to the updated manuscript: “Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method. These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging. In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological states. Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements.”

      Reviewer #1 (Recommendations For The Authors): 

      R1.1 The manuscript is well-organized and presents arguments in a logical order. The visual representation of results in the form of figures is sufficient (see style suggestions below). 

      Thanks for your suggestions on improving the figures, we have updated figures for better visualization (Please see our response to R1.5, R1.6, R1.7 and R1.8).

      R1.2 It would be beneficial if the model/toolbox could be made publicly available so that fellow researchers from the community could apply and test it in their research. 

      We have added a section “Data and code availability statements” in the revised manuscript to indicate we’re willing to share the toolbox to the interested groups (L529 in the annotated manuscript). The toolbox is a standalone MATLAB-based program (no license required) to generate kw, CBF and ATT maps, which can run on windows or MAC computers. Indeed, we have been sharing our reconstruction toolbox with over 50 collaboration sites. The following screenshots are examples of three steps performed by the toolbox (shared by one collaborator):

      Author response image 1.

      Step 1: Loading raw data and calculate T1 map

      Author response image 2.

      Step 2: Motion correction and skull stripping

      Author response image 3.

      Step 3: kw, CBF and ATT quantification (nii files will be saved)

      R1.3 Line 46 states that the technique is novel, but it has been introduced and used before (Shao, et al. MRM 2019). It sure is innovative but the term novel is too strong and may confuse the readers that it is something new introduced in this manuscript.

      Thanks for the suggestion, we agree the term ‘novel’ may cause confusion about the technique, we have removed it in the revised manuscript (L48, L50).

      R1.4 Line 395, kw was generated using PLD = 1.8s with b = 0, 50 s/mm2. Is only one-time point enough for estimating kw? To me, it is not clear how robust is the kw estimation with only one PLD.

      According to the single-pass approximation (SPA) model (1), kw can be accurately estimated when the PLD is longer than the ATT. We recruited cognitively normal participants in this study and found the longest ATT to be 1526.7±117.4 and 1468.1±166.9 ms in aged (62-92 years) males and females, respectively. A PLD of 1.8 s was chosen to balance the SNR of the data and the accuracy of the model fitting, which should be sufficient for this study. However, for future studies involving diseased populations with prolonged ATT, a longer PLD should be used, or a multi-PLD protocol could be helpful to improve the robustness of quantification accuracy.

      We have added a limitation statement in the revised manuscript (L407): "A single PLD of 1800 ms was used in this study, which should be sufficient to allow all the labeled water to reach the tissue (i.e., the longest ATT was 1526.7±117.4 and 1468.1±166.9 ms in aged males and females, respectively) (1). However, a longer PLD should be used in participants with longer expected ATT, such as in stroke and cerebrovascular disorders. Additionally, a multi-PLD protocol can also be helpful to improve the robustness of quantification accuracy (2)."

      R1.5 Suggestion: Figure 3A, colormap for kw appears suboptimal. Regional differences are hard to see.

      Thanks for the suggestion, we have updated the range of color scale (from [0, 200], to [70, 160]) to highlight the regional differences in the updated Figure 3:

      We prefer to use the same blue colormap that we and our collaborators have been using this for publications to maintain consistence. We also acknowledged the limitation of the spatial resolution of kw maps in the updated manuscript (L412): “To compensate for the half signal loss of the non-CPMG DP module, relatively low spatial resolution and TGV-regularized SPA modeling were employed. Our recently development of a motion-compensated diffusion weighted (MCDW)-pCASL can be utilized to improve the spatial resolution in the future studies (e.g. 3.5 mm3 isotropic maps in 10 mins) (2)”

      R1.6 Suggestion: use same/similar colormaps for the same parameters (kw, ATT, CBF) to help the reader follow across Figures 3, 4, and 5.

      Thanks for your suggestion, we agree that using the same color would be easier for readers to follow the context. However, figures 4 and 5 were created to show the age and sex dependent changes, so that we used warm and cold colors to indicate effects of decrease and increase, respectively. We clarified the choice of colormap in the figure captions (L260, L284): “The effects of decrease or increase were represented by warm colors (yellow to red) and cold (gray to blue) colors, respectively.”

      R1.7 Suggestion: please be consistent with the ordering of parameters in Figures 3, 4, and 5.

      Thanks for the suggestion, we have updated Figure 3 to consistently show kw, CBF and ATT results in order from left to right:

      R1.8 Suggestion: use the same scaling (e.g.[|1.9|, |11 |] for Fig. 4, [|1.9|, |4|] for Figure 5) to enhance comparability across parameters in the subfigures.

      Thanks for the suggestion, we agree that the same scaling would enhance the comparability across parameters. We have updated the color scales for Figure 5 using maximal |T| = 4:

      However, range of maximal |T| was relatively large for Figure 4 (i.e. 5 for kw, 11 for CBF and 7 for ATT), and using the same color scale might oversaturate the regional responses or diminish the visibility of regional differences. Therefore, we prefer to keep the original color scale for Figure 4.

      R1.9 In Figure 5, the interaction of age with sex in kw parameter seems to be more on one side of the brain. What could be the reasons for possible lateralization? 

      We agree with the reviewer that the age and sex interaction effects emphasized on one side is an interesting finding. While we do not have a clear explanation now, we suspect it may relate to aging-related asymmetrical vascular burdens. Giannakopoulos et al. reported that vascular scores, indicating higher vascular burden, were significantly higher in the left hemisphere across all Clinical Dementia Rating scores. Moreover, the predominance of Alzheimer’s disease and vascular pathology in the right hemisphere correlated with significantly higher Clinical Dementia Rating scores  (3). We added the following to the updated manuscript to discuss this potential mechanism (L370): “… We also observed an asymmetric effect on left and right brain hemispheres, which might be associated with asymmetrically developed vascular burdens in aging (3)."

      R1.10 A comparison between the present study and DCE MRI as well as other ASL methods evaluating BBB function with age is missing. ASL techniques probing transverse relaxation and DCE MRI have reported increased kw with age in humans as well as in animal models. What could be the reasons? 

      We agree with the reviewer that BBB water exchange measured by other methods should be sufficiently discussed, especially regarding their age-related changes. We added the following discussion in the updated manuscript (L415): “Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years) (4). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL (5) and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method (6). These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging (5, 6). In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina) (2). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological states (7, 8). Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements (9-13).”

      R1.11 Line 163/164, a rapid decrease of CBF in males in the region of the hippocampus is reported. It would be beneficial to discuss this in discussion further (has this been reported before, possible reasons, etc). 

      Thanks for the suggestion, we agree that the accelerated CBF decline in males in the hippocampus is an important finding, we have added discussion in the revised manuscript (L300): "Furthermore, we found a more pronounced age-related decline in CBF in the hippocampus of males compared to females (Fig. 2, Supplemental Table S2). To the best of our knowledge, no study has previously reported this accelerated hippocampal CBF decline in males. This finding may be linked to the accelerated hippocampal volume loss in males, as reported in a study analyzing 19,793 generally healthy UK Biobank participants (14). Lower hippocampal perfusion has been associated with poor memory performance (15, 16), suggesting that males might be more vulnerable to potential cognitive decline (17).

      R1.12 Lines 198-202 describe a simulation done to test the dependence of kw on ATT. This is important and could be explained more in detail. Adding simulation results (numeric or figure) to supplementary materials would increase reproducibility and understanding for others. 

      We apologize for not referencing to the simulation results in the main text. We simulated kw distribution for females by adjusting ATT by +60 ms to matching males’ ATT, leading to a marginally higher kw values. And these results were shown in the Supplemental Figure S2 C (yellow):

      We have now referenced the simulation results in the updated manuscript (L206).

      R1.13 No limitations of the presented work are mentioned. A critical perspective would increase the scientific impact on future research decisions and implementation of this method by others. 

      Thanks for the suggestion, we agree the limitations need to be acknowledged. We have added a limitation paragraph in the revised manuscript (L406): "Limitations of the study and future directions: There are a few limitations of this study. A single PLD of 1800 ms was used in this study, which should be sufficient to allow all the labeled water to reach the tissue (i.e., the longest ATT was 1526.7±117.4 and 1468.1±166.9 ms in aged males and females, respectively) (1). However, a longer PLD should be used in participants with longer expected ATT, such as in stroke and cerebrovascular disorders. Additionally, a multi-PLD protocol can also be helpful to improve the robustness of quantification accuracy (2). To compensate for the half signal loss of the non-CPMG DP module, relatively low spatial resolution and TGV-regularized SPA modeling were employed. Our recently development of a motion-compensated diffusion weighted (MCDW)-pCASL can be utilized to improve the spatial resolution in the future studies (e.g. 3.5 mm3 isotropic maps in 10 mins) (2). Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years) (4). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL (5) and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method (6). These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging (5, 6). In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina) (2). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological stages (7, 8). Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements (9-13). Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research (18, 19). However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health (20). For example, education has been shown to be highly relevant to regional CBF changes in AD (21, 22). Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies. Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to the unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.

      Reviewer #2 (Public Review):

      Summary: 

      This study used a novel diffusion-weighted pseudo-continuous arterial spin labelling (pCASL) technique to simultaneously explore age- and sex-related differences in brain tissue perfusion (i.e., cerebral blood flow (CBF) & arterial transit time (ATT) - a measure of CBF delivery to brain tissue) and blood-brain barrier (BBB) function, measured as the water exchange (kw) across the BBB. While age- and sex-related effects on CBF are well known, this study provides new insights to support the growing evidence of these important factors in cerebrovascular health, particularly in BBB function. Across the brain, the decline in CBF and BBB function (kw) and elevation in ATT were reported in older adults, after the age of 60, and more so in males compared to females. This was also evident in key cognitive regions including the insular, prefrontal, and medial temporal regions, stressing the consideration of age and sex in these brain physiological assessments. 

      Strengths: 

      Simultaneous assessment of CBF with BBB along with transit time and at the voxel-level helped elucidate the brain's vulnerability to age and sex-effects. It is apparent that the investigators carefully designed this study to assess regional associations of age and sex with attention to exploring potential non-linear effects. 

      Weaknesses: 

      R2.0 It appears that no brain region showed concurrent CBF and BBB dysfunction (kw), based on the results reported in the main manuscript and supplemental information. Was an association analysis between CBF and kw performed? There is a potential effect of the level of formal education on CBF (PMID: 12633147; 15534055), which could have been considered and accounted for as well, especially for a cohort with stated diversity (age, race, sex). 

      Thank you for your positive feedback and comments on the potential associations between BBB kw and other physiological parameters (e.g., CBF) and socioeconomic factors (e.g., education). We have made the following changes to the updated manuscript:

      (1) We conducted additional linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining). The results are summarized in Supplemental Table S6. We found that BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, parahippocampal gyrus, and medial temporal lobe in participants younger than 62 years, when kw was relatively consistent across ages. However, no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional ROIs, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years. These results suggest that BBB function may be influenced by different aspects of neurovascular function represented by CBF and ATT at different stages of aging.

      (2) One limitation of this study is the lack of information on participants’ geographical, cultural, physical characteristics, and socioeconomic factors. While we included race as a covariate to account for potential variations observed in previous research, race is an imprecise proxy for the complex interplay of genetic, environmental, socioeconomic, and cultural factors that influence physiological outcomes. We have acknowledged this limitation by adding the following discussion in the updated manuscript: “Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research. However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health. For example, education has been shown to be highly relevant to regional CBF changes in AD. Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies.”

      Reviewer #2 (Recommendations For The Authors): 

      General comments: 

      I commend the authors on a very well-written and laid-out study. General remarks have been provided in the short assessment and public review sections. 

      We would like to thank the reviewer for the insightful suggestions and overall positive feedback. We have substantial revised and improved our manuscript, and point-to-point responses can be found in the following sections and in the annotated manuscript.

      Specific comments: 

      Results: 

      R2.1 Line 127: "since race may influence the changes in perfusion and kw with aging, it was included as a covariate". It is not clear how race - a simplistic term for ethnicity or to be more specific ancestry has been shown to influence changes in perfusion? Is it known for a fact that for example, older Black people have lower/higher CBF or kw compared to Asians or Asians to Caucasian Americans? Can this be extrapolated to Japanese Brazilians having different patterns of regional CBF to Caucasian or Black Brazilians or similar patterns of CBF to Japanese people in Japan since they share similar race? Do Dutch people in the Netherlands share CBF characteristics to their descendants in the US or in South Africa? Would the geographical, cultural, and other physical characteristics of one's ethnicity or lineage impact CBF? Race is often used as a poor substitute for the complex interactions of physical, socioeconomic, and geopolitical factors that produce disparities that may have measurable biological effects including CBF. But it is not clear why being one race vs the other will impact CBF, without carefully parcelling out the many factors beyond biology, if any. Is any of the participants in the study mixed race? How about recently settled individuals who may identify for example as Black but have spent all their life up to adult years outside of the US and marked here in the study as simply African American? Not that I am saying this is the case. However this simplification may require more careful analysis. 

      In our study, no participant indicated to be mixed-race, and unfortunately we do not have additional information about their specific ancestry or information about their geographical, cultural, and other physical characteristics. We acknowledge that race is an imprecise proxy for the complex interplay of genetic, environmental, socioeconomic, and cultural factors that influence physiological outcomes, including perfusion and BBB function. The use of race as a covariate in our study is intended to account for potential variations observed in previous research, rather than to imply a direct causal relationship.

      Research has shown differences in blood flow among racial groups (18, 19). However, these differences are not solely attributable to race, and they are also shaped by environmental exposures, lifestyle factors, healthcare access, and other social determinants of health (20). We have added the following discussion in the updated manuscript (L436): “Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research (18, 19). However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health (20). For example, education has been shown to be highly relevant to regional CBF changes in AD (21, 22). Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies.”

      R2.2 Figure 3: Could the standard deviation of the reported values be also stated so the variance can be appreciated? 

      Thanks for the suggestion, we have added the standard deviation of the kw, CBF and ATT values on the updated Figure 3:

      R2.3 Discussions: Line 280: .."observed distinct trajectory of kw changes with aging as compared with CBF and ATT. I presume this as compared to the earlier statements (line 268) of pervasive increase in ATT and decrease in CBF across the brain. Were there any brain regions that showed increased ATT, decreased CBF and kw as a function of age or even sex?? Was there any association between CBF and kw in any brain regions, across the participants after controlling for sex differences? If there is a suspicion of early BBB dysfunction (line 286) preceding cognitive decline that has been also suspected with CBF, is this concomitant with CBF in most people? This could maybe make CBF an easier and more straightforward biomarker since its effects mirror that of BBB? I suspect it generally does not, even in healthy aging. It would have been great to shed more light on this with your results and in your discussion.

      Thank you for your comments. By 'distinct trajectory of kw changes with aging,' we refer to the ‘turning point’ in age at which kw starts declining. BBB kw remained relatively stable and began to decline in the early 60s, while CBF consistently decreased and ATT consistently increased with age, although the rates of change differed at 22 years and 36 years, respectively. Using linear regressions for voxel analysis, Figure 4 shows that age-dependent decreases in CBF and increases in ATT were observed in most of the brain. However, significant age-related decreases in kw were more localized to specific brain regions and were mostly accompanied by simultaneous decreases in CBF and increases in ATT. We highlighted this finding in the updated manuscript (L250): “In the brain regions showing significant age-related kw decreases (Fig. 4A), these decreases are mostly accompanied by CBF decreases (Fig. 4B) and ATT increases (Fig. 4C).”

      Thank you for your suggestion regarding the relationship between kw and CBF. We further conducted linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining). The results are summarized Supplemental Table S6.

      This new supplemental tables shows many interesting results. BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, parahippocampal gyrus, and medial temporal lobe in participants younger than 62 years, when kw was relatively consistent across ages. However, no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional ROIs, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years.

      We have added the following discussion to the updated manuscript (L307): 'We observed a distinct trajectory of kw changes with aging compared to CBF and ATT. To study the potential regional associations between kw and CBF and ATT, we conducted linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining), respectively. The results are shown in Supplemental Table S6. BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, PHG, and MTL in participants aged 8-61 years (when kw was relatively consistent across ages), but no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional brain regions, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years. These results suggest that BBB function may be affected by different aspects of neurovascular function represented by CBF and ATT at different stages of aging."

      Other notes: 

      R2.4 While reading the results section, two things that jump out at me when I saw the sex differences: 1) hematocrit and 2) menopausal status. I saw in the discussion that these were touched on. I may have missed this in the methods, was hematocrit collected and included in the parameters estimates?? Was the menopausal status including ERT (estrogen replacement therapies) recorded and factored in? If not these could be included as limitations that may confound the results, especially when the age groups were split to include a group comprising or potentially both pre-and post-menopausal females (36-61). 

      We do not have the information about hematocrit nor menopausal status and they were not included in data analysis. We agree this is a limitation of the current study and we discussed in the updated manuscript (L442): “Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to data unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.”

      R2.5 The general vascular health of the cohort is not well described especially if some of the participants were from sickle cell study. While they are cognitively normal and free from major medical illnesses, or neurological disorders, did the sample also include individuals with considerable vascular risk factors and metabolic syndrome (known to affect CBF), especially in the older cohort?? 

      We agree with the reviewer that vascular health can significantly impact perfusion and BBB function. Since the data presented in this study were collected from multiple cohorts, vascular risk factors were not available in all cohorts and thus were not included as covariates in the data analysis. To account for potential vascular variations across participants, we included CBF and ATT as covariates in our analysis on age related BBB kw changes. We have added discussion in the updated manuscript (L442, same as our response to the previous comment): “Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to data unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.”.

      References:

      (1) K. S. St Lawrence, D. Owen, D. J. Wang, A two-stage approach for measuring vascular water exchange and arterial transit time by diffusion-weighted perfusion MRI. Magn Reson Med 67, 1275-1284 (2012).

      (2) X. Shao, C. Zhao, Q. Shou, K. S. St Lawrence, D. J. Wang, Quantification of blood–brain barrier water exchange and permeability with multidelay diffusion‐weighted pseudo‐continuous arterial spin labeling. Magnetic Resonance in Medicine  (2023).

      (3) P. Giannakopoulos, E. Kövari, F. R. Herrmann, P. R. Hof, C. Bouras, Interhemispheric distribution of Alzheimer disease and vascular pathology in brain aging. Stroke  (2009).

      (4) A. Mahroo, S. Konstandin, M. Günther, Blood–Brain Barrier Permeability to Water Measured Using Multiple Echo Time Arterial Spin Labeling MRI in the Aging Human Brain. Journal of Magnetic Resonance Imaging 59, 1269-1282 (2024).

      (5) Y. Ohene et al., Increased blood–brain barrier permeability to water in the aging brain detected using noninvasive multi‐TE ASL MRI. Magnetic resonance in medicine 85, 326-333 (2021).

      (6) B. R. Dickie, H. Boutin, G. J. Parker, L. M. Parkes, Alzheimer's disease pathology is associated with earlier alterations to blood–brain barrier water permeability compared with healthy ageing in TgF344‐AD rats. NMR in Biomedicine 34, e4510 (2021).

      (7) Y. Ying et al., Heterogeneous blood‐brain barrier dysfunction in cerebral small vessel diseases. Alzheimer's & Dementia  (2024).

      (8) V. Zachariou et al., Regional differences in the link between water exchange rate across the blood–brain barrier and cognitive performance in normal aging. GeroScience, 1-18 (2023).

      (9) Y. Zhang et al., Increased cerebral vascularization and decreased water exchange across the blood-brain barrier in aquaporin-4 knockout mice. PLoS One 14, e0218415 (2019).

      (10) Y. Ohene et al., Non-invasive MRI of brain clearance pathways using multiple echo time arterial spin labelling: an aquaporin-4 study. NeuroImage 188, 515-523 (2019).

      (11) Y. V. Tiwari, J. Lu, Q. Shen, B. Cerqueira, T. Q. Duong, Magnetic resonance imaging of blood–brain barrier permeability in ischemic stroke using diffusion-weighted arterial spin labeling in rats. Journal of Cerebral Blood Flow & Metabolism 37, 2706-2715 (2017).

      (12) Z. Wei et al., Non-contrast assessment of blood-brain barrier permeability to water in mice: an arterial spin labeling study at cerebral veins. NeuroImage, 119870 (2023).

      (13) Y. Jia et al., Transmembrane water-efflux rate measured by magnetic resonance imaging as a biomarker of the expression of aquaporin-4 in gliomas. Nature Biomedical Engineering 7, 236-252 (2023).

      (14) L. Nobis et al., Hippocampal volume across age: Nomograms derived from over 19,700 people in UK Biobank. NeuroImage: Clinical 23, 101904 (2019).

      (15) S. Rane et al., Inverse correspondence between hippocampal perfusion and verbal memory performance in older adults. Hippocampus 23, 213-220 (2013).

      (16) S. Heo et al., Resting hippocampal blood flow, spatial memory and aging. Brain research 1315, 119-127 (2010).

      (17) O. Gannon, L. Robison, A. Custozzo, K. Zuloaga, Sex differences in risk factors for vascular contributions to cognitive impairment & dementia. Neurochemistry international 127, 38-55 (2019).

      (18) A. E. Leeuwis et al., Cerebral blood flow and cognitive functioning in a community-based, multi-ethnic cohort: the SABRE study. Frontiers in aging neuroscience 10, 279 (2018).

      (19) L. R. Clark et al., Association of cardiovascular and Alzheimer’s disease risk factors with intracranial arterial blood flow in Whites and African Americans. Journal of Alzheimer's Disease 72, 919-929 (2019).

      (20) D. R. Williams, S. A. Mohammed, Discrimination and racial disparities in health: evidence and needed research. Journal of behavioral medicine 32, 20-47 (2009).

      (21) N. Scarmeas et al., Association of life activities with cerebral blood flow in Alzheimer disease: implications for the cognitive reserve hypothesis. Archives of neurology 60, 359-365 (2003).

      (22) N.-T. Chiu, B.-F. Lee, S. Hsiao, M.-C. Pai, Educational level influences regional cerebral blood flow in patients with Alzheimer’s disease. Journal of Nuclear Medicine 45, 1860-1863 (2004).

      (23) R. C. Gur et al., Gender differences in age effect on brain atrophy measured by magnetic resonance imaging. Proceedings of the National Academy of Sciences 88, 2845-2849 (1991).

      (24) M. J. Cipolla, J. A. Godfrey, M. J. Wiegman, The effect of ovariectomy and estrogen on penetrating brain arterioles and blood-brain barrier permeability. Microcirculation 16, 685-693 (2009).

      (25) A. C. Wilson et al., Reproductive hormones regulate the selective permeability of the blood-brain barrier. Biochim Biophys Acta 1782, 401-407 (2008).

      (26) M. S. Stringer et al., Tracer kinetic assessment of blood–brain barrier leakage and blood volume in cerebral small vessel disease: Associations with disease burden and vascular risk factors. NeuroImage: Clinical 32, 102883 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, by using simulation, in vitro and in vivo electrophysiology, and behavioral tests, Peng et al. nicely showed a new approach for the treatment of neuropathic pain in mice. They found that terahertz (THz) waves increased Kv conductance and decreased the frequency of action potentials in pyramidal neurons in the ACC region. Behaviorally, terahertz (THz) waves alleviated neuropathic pain in the mouse model. Overall, this is an interesting study. The experimental design is clear, the data is presented well, and the paper is well-written. I have a few suggestions.

      (1) The authors provide strong theoretical and experimental evidence for the impact of voltage-gated potassium channels by terahertz wave frequency. However, the modulation of action potential also relies on non-voltage-dependent ion channels. For example, I noticed that the RMP was affected by THz application (Figure 3F) as well. As the RMP is largely regulated by the leak potassium channels (Tandem-pore potassium channels), I would suggest testing whether terahertz wave photons have also any impact on the Kleak channels as well.

      Thank you for your positive comment and for providing us with this valuable suggestion. After testing the leak K+ current with and without HFTS on the SNI model, we observed a notable increase in the leak K+ current with HFTS when the holding potential surpassed -40 mV (please see the revised Figs. 2m and n). This finding prompted us to delve deeper into the shifts in the resting membrane potential (RMP). The data, along with statistical analysis, are detailed in Tables S1-3.

      (2) The activation curves of the Kv currents in Figure 2h seem to be not well-fitted. I would suggest testing a higher voltage (>100 mV) to collect more data to achieve a better fitting.

      Thanks for your advice. We repeated the experiment while maintaining the voltage of patched neurons at a higher level (>100 mV) to collect ample data for better fitting. The outcomes are illustrated in the revised Figs. 2g-j. Clearly, the data reveals a significant increase in K+ conductance in the HFTS group as compared to the SNI group. We have integrated these discoveries into the revised manuscript, replacing the earlier results.

      (3) In the part of behavior tests, the pain threshold increased after THz application and lasted within 60 mins. I suggest conducting prolonged tests to determine the end of the analgesic effect of terahertz waves.

      Thank you for your insightful comment. We echo your curiosity about the duration of the HFTS effect. In the process of revising our work, we conducted a comparative analysis of the analgesic duration resulting from 10-minute and 15-minute applications of HFTS. The findings are visualized in the revised Fig. 5c. Our observations indicate that after 160 minutes, the PWMT value for the 15-minute HFTS group decreased to a level comparable to that of the SNI group. Meanwhile, the analgesic effects persisted for 140 minutes in the case of the 10-minute HFTS application. These results imply a direct correlation between the duration of HFTS application and the duration of analgesia.

      (4) Regarding in vivo electrophysiological recordings, the post-HFTS recordings were acquired from a time window of up to 20 min. It seems that the HFTS effect lasted for minutes, but this was not tested in vitro where they looked at potassium currents. This long-lasting effect of HFTS is interesting. Can the authors discuss it and its possible mechanisms, or test it in slice electrophysiological experiments?

      Thank you for your comment. Based on the results from in vivo electrophysiological recordings, it was observed that the effect of HFTS can endure for a minimum of 20 minutes, and this duration was even more extended in behavioral assessments. Taking your advice, we employed slice electrophysiological recording for further testing. Following a 15-minute application of HFTS, we evaluated the K+ current at 5 and 20 minutes after incubation. Our observations clearly indicated a substantial and lasting increase in K+ current, with the effect persisting for at least 20 minutes (refer to Fig. 2l). This provides confirmation of the long-lasting influence of HFTS. The relevant data and statistical analysis are documented in Table S1-2.

      (5) How did the authors arrange the fiber for HFTS delivery and the electrode for in vivo multi-channel recordings? Providing a schematic illustration in Figure 4 would be useful.

      Thank you for your comment. To enhance the reader's understanding of the HFTS delivery device during multi-channel recording, we have included a schematic illustration in Fig. 4a in the revised manuscript. The top portion of Fig. 4a depicts a quantum cascade laser (QCL) with a center frequency located at approximately 36 THz. This laser is then connected to the recording electrode via a PIR fiber. The left section illustrates the detailed structure of the recording electrode.

      (6) Some grammatical errors should be corrected.

      Thank you for your thorough review. We have carefully checked and corrected grammar errors we found throughout the entire text to ensure that readers can better comprehend the content of the article.

      Reviewer #2 (Public Review):

      In this manuscript, Peng et al., reported that 36 THz high-frequency terahertz stimulation (HFTS) can suppress the activity of pyramidal neurons by enhancing the conductance of voltage-gated potassium channel. The authors also demonstrated the effectiveness of using 36THz HFTS for treating neuropathic pain.

      Strengths:

      The manuscript is well written and the conclusions are supported by robust results. This study highlighted the potential of using 36 THz HFTS for neuromodulation.

      Weaknesses:

      More characterization of HFTS is needed, so the readers can have a better assessment of the potential usage of HFTS in their own applications.

      Thank you for your suggestion. We have created schematic diagrams illustrating the HFTS delivery (Fig. 4a and Fig. 5a in the revised manuscript). Fig. 4a presents the structure designed for in vivo multi-channel recording. Fig. 5a shows the structure used in behavior test, the recording electrode is replaced by a metal hollow tube, allowing the PIR fiber to pass through the tube and target the ACC region of the mice.

      (1) It would be very helpful to estimate the volume of tissue that can be influenced by HFTS. It is not clear how 15 mins HFTS was chosen for this functional study. Does a longer time have a stronger effect? A better characterization of the relationship between the stimulus duration of HFTS and its beneficial effects would be very useful.

      Thank you for your feedback. The degree of tissue influence is directly related to the size of the spot emerging from the fiber outlet. In our experiment, we used a PIR fiber with a 630 nm inner core diameter to propagate high-frequency THz waves. This core features a refractive index of 2.15 and has an effective numerical aperture (NA) of 0.35 ± 0.05.

      Our decision to apply HFTS for 15 minutes in the behavioral study was primarily based on observations from in vivo multi-channel recordings. Specifically, we noticed a considerable reduction in the average firing rate of PYR cells after 15 minutes of HFTS exposure. To further investigate the correlation between the duration of HFTS stimulation and its effects, we conducted a comparative study using a 10-minute HFTS session. The results, depicted in revised Fig. 5c, reveal that the PWMT value decreased to the level seen in the SNI group after approximately 160 minutes following 15 minutes of HFTS, and after about 140 minutes with 10 minutes of HFTS. This suggests a direct relationship between the length of HFTS application and its beneficial outcomes.

      (2) How long does the behavioral effect last after 15 minutes of HFTS? Figure 5b only presents the behavioral effect for one hour, but the pain level is still effectively reduced at this time point. The behavioral measurement should last until pain sensitization drops back to pre-stim level.

      Thank you for your feedback. Similar question is also mentioned by reviewer 1. As depicted in Fig. 5c, it was observed that the analgesic effects lasted for 140-160 min with 10-15 minutes application of HFTS. Based on these findings, we can conclude that in the SNI model, targeting the ACC brain region with HFTS for a duration of 10-15 minutes results in an analgesic effect that lasts for roughly 140-160 minutes. This provides valuable insights into the potential clinical applications and duration of relief that can be achieved through HFTS treatment.

      (3) Although the manuscript only tested in ACC, it will also be useful to demonstrate the neural modulation effect on other brain regions. Would 36THz HFTS also robustly modulate activities in other brain regions? Or are different frequencies needed for different brain regions?

      Thank you for your comment. We hypothesize that light waves at a frequency of approximately 36 THz effectively modulate neuronal activities in various brain regions, primarily due to their impact on K channels. Additionally, we speculate that the application of THz waves at different frequencies may influence other channels, such as Na and Ca channels, potentially facilitating or inhibiting neuronal activities. We believe this is a fascinating and significant area of research to explore in the future.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Peng et al. presents intriguing data indicating that high-frequency terahertz stimulation (HFTS) of the anterior cingulate cortex (ACC) can alleviate neuropathic pain behaviors in mice. Specifically, the investigators report that terahertz (THz) frequency stimulation widens the selectivity filter of potassium channels thereby increasing potassium conductance and leading to a reduction in the excitability of cortical neurons. In voltage clamp recordings from layer 5 ACC pyramidal neurons in acute brain slice, Peng et al. show that HFTS enhances K current while showing minimal effects on Na current. Current clamp recording analyses show that the spared nerve injury model of neuropathic pain decreases the current threshold for action potential (AP) generation and increases evoked AP frequency in layer 5 ACC pyramidal neurons, which is consistent with previous studies. Data are presented showing that ex-vivo treatment with HFTS in slice reduces these SNI-induced changes to excitability in layer 5 ACC pyramidal neurons. The authors also confirm that HFTS reduces the excitability of layer 5 ACC pyramidal neurons via in vivo multi-channel recordings from SNI mice. Lastly, the authors show that HFTS is effective at reducing mechanical allodynia in SNI using both the von Frey and Catwalk analyses. Overall, there is considerable enthusiasm for the findings presented in this manuscript given the need for non-pharmacological treatments for pain in the clinical setting.

      Strengths:

      The authors use a multifaceted approach that includes modeling, ex-vivo and in-vivo electrophysiological recordings, and behavioral analyses. Interpretation of the findings is consistent with the data presented. This preclinical work in mice provides new insight into the potential use of directed high-frequency stimulation to the cortex as a primary or adjunctive treatment for chronic pain.

      Weaknesses:

      There are a few concerns noted that if addressed, would significantly increase enthusiasm for the study.

      (1) The left Na current trace for SNI + HFTS in Figure 2B looks to have a significant series resistance error. Time constants (tau) for the rate of activation and inactivation for Na currents would be informative.

      Thank you for your feedback. We have carefully considered your comments and made several adjustments in the revised Figs. 2b-f to improve clarity and accuracy. Firstly, we have conducted a comparison of the time constants (tau) between the SNI group and the SNI+HFTS group. These time constants represent the latency of Na current activation or inactivation relative to the half-activated/inactivated voltage. Our analysis reveals that there is no statistically significant difference in tau between the two groups for both activation and deactivation curves. Secondly, we have updated the sample traces in Fig. 2b of the revised manuscript. These new traces illustrate that tau does not significantly differ between the SNI and SNI+HFTS groups, providing a visual representation of our findings. We believe that these modifications strengthen the presentation of our study's details and results, making the data more accessible and understandable for readers.

      (2) It is unclear why an unpaired t-test was performed for paired data in Figure 2. Also, statistical methods and values for non-significant data should be presented.

      Thank you for your comment. I think you mean the results in Fig. 3. We agree with you that we should use one-way ANOVA to analyze the data since there are more than 2 groups for comparison. We thus re-analyzed the data by using one-way ANOVA in Figs. 3g-k, and have included detailed statistical methods and P values in the revised manuscript.

      (3) It would seem logical to perform HFTS on ACC-Pyr neurons in acute slices from sham mice (i.e. Figure 3 scenario). These experiments would be informative given the data presented in Figure 4.

      Thank you for your valuable advice. During the revision process, we performed HFTS on ACC-PYR neurons in acute slices obtained from sham mice. The findings from this experiment have been integrated into the updated Fig. 3, where the sham group is represented by the green line and histogram (the revised Fig. 3 in the manuscript). It is noteworthy that a significant decrease in spike frequency was observed in the sham mice following HFTS.

      (4) As the data are presented in Figure 4g, it does not seem as if SNI significantly increased the mean firing rate for ACC-Pyr neurons, which is observed in the slice. The data were analyzed using a paired t-test within each group (sham and SNI), but there is no indication that statistical comparisons across groups were performed. If the argument is that HFTS can restore normal activity of ACC-Pyr neurons following SNI, this is a bit concerning if no significant increase in ACC-Pyr activity is observed in in-vivo recordings from SNI mice.

      Thank you for highlighting the inaccuracies in the analysis. After reviewing the data, we re-analyzed it using alternative statistical methods. In the revised version, since the data did not follow a normal distribution, we employed Wilcoxon matched-paired signed rank tests within the sham and SNI groups, and Mann-Whitney tests between the sham and SNI groups.

      Upon comparing the statistical outcomes across the groups, we found that the mean firing rate of 130 ACC neurons in SNI mice was significantly higher compared to that of 108 ACC neurons in sham mice (P = 0.0447, Mann-Whitney test). Notably, the mean firing rate of ACC-PYR exhibited a more pronounced increase with a P value of 0.0274 in SNI pre-HFTS versus sham pre-HFTS, while the mean firing rate of ACC-INT did not display a significant change across the groups. These findings align with the observations we made in the slice, reinforcing the validity of our results.

      (5) The authors indicate that the effects of HFTS are due to changes in Kv1.2. However, they do not directly test this. A blocking peptide or dendrotoxin could be used in voltage clamp recordings to eliminate Kv1.2 current and then test if this eliminates the effects of HFTS. If K current is completely blocked in VC recordings then the authors can claim that currents they are recording are Kv1.1 or 1.2.

      Thank you for your kind suggestion. In our research, we employed the Kv1.2 structure as a model to determine the response frequency of terahertz waves. Through both in vitro and in vivo experiments, we were able to demonstrate that the frequency of approximately 36 THz affects the Kv channel and its corresponding spike frequency. Upon analyzing the action potential waveform, we observed a notable variance in the resting membrane potential (RMP). This RMP is predominantly controlled by leak potassium channels, specifically the Tandem-pore potassium channels. In accordance with the recommendation of reviewer 1, we have addressed this particular aspect of our experimentation in the revised manuscript.

      We agree that we should use blocking peptides or dendrotoxin to eliminate Kv1.2 current. However, we meet problems in purchasing and delivery of the drugs. We thus added some explanation in the Discussion part to emphasize the value for this pharmacological experiment and can further confirm this in the future works.

      (6) The ACC is implicated in modulating the aversive aspect of pain. It would be interesting to know whether HFTS could induce conditioned place preference in SNI mice via negative reinforcement (i.e. alleviation of spontaneous pain due to the injury). This would strengthen the clinical relevance of using HFTS in treating pain.

      Thank you for this valuable advice. We share your intrigue regarding this experiment, and we fully recognize the importance and potential of further exploring this area. At present, however, our equipment and platform limitations prevent us from conducting the necessary tests. However, we remain committed to pursuing relevant research opportunities in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      (1) Study suggests that the effects of their tumor models of mouse behavioral are largely non-specific to the tumor as most behaviors are rescued by analgesic treatment. So, most of the changes were likely due to site-specific pain and not a unique signal from the tumor.

      The tumor generates pain at the site it is implanted, and it is likely amplified by the oral activities tumor bearing mice have to engage in. As there is no pain in the absence of the tumor, the pain is, by definition, caused by the tumor, not by the site. Concerning the relationship between pain and behavior, the behavioral assays undertaken in our study (nesting, cookie test, wheel running) were very limited in scope.  Two of these assays (nesting, cookie test) require use of the oral cavity. Only nesting and wheel running were assessed in the context of treatment for pain. Nesting behavior was completely restored with carprofen and buprenorphine treatment suggesting that in the absence of pain, mice were able to make perfect nests. Consistent with this, carprofen and buprenorphine treated animals also gained weight indicating that eating (another activity dependent on the oral cavity) was also restored.  Wheel running, an activity that does not rely on the oral cavity, was only partially restored with drug treatment. While additional behavioral tests are necessary to confirm this finding, the data suggest that there is pain-independent information relayed to the brain which accounts for this decline in wheel running.

      Reviewer #2:

      (1) The main claim is that tumor-infiltrating nerves underlie cancer-induced behavioral alterations, but the experimental interventions are not specific enough to support this. For example, all TRPV1 neurons, including those innervating the skin and internal organs, are ablated to examine sensory innervation of the tumor. Within the context of cancer, behavioral changes may be due to systemic inflammation, which may alter TRPV1 afferents outside the local proximity of tumor cells. A direct test of the claims of this paper would be to selectively inhibit/ablate nerve fibers innervating the tumor or mouth region.

      We agree with the reviewer that a direct test of the hypothesis would require selectively inhibiting the nerve fibers innervating the tumor and assessing the impact on behavior. Studies in the lab are on-going using pharmacological interventions to do this. These studies are beyond the scope of this current manuscript.

      (2) Behavioral results from TRPV1 neuron ablation studies are in part confounded by differing tumor sizes in ablated versus control mice. Are the differences in behavior potentially explained by the ablated animals having significantly smaller tumors? The differences in tumor sizes are not negligible. One way to examine this possibility might be to correlate behavioral outcomes with tumor size.

      As suggested by the reviewer, we have graphed nesting scores and time-to-interact (cookie test) relative to tumor volume.  In both cases, we used simple linear regression to fit the data and analyzed the slopes of the lines. In the case of nesting, there was no significant difference between the slopes. This is now included as Supplemental Figure 4A. In the case of the cookie test, there was a significant difference between the slopes. This is now included as Supplemental Figure 4B. Graphing the data in this way allows one to look at any given tumor volume and infer what the nesting score and the time-to-interact for the two groups of mice. The linear regression model fits the time to interact with the cookie reasonably well, thus from this graph, we can see that at any given tumor volume the time to interact with the cookie was generally shorter in TRPV1cre::DTAfl/wt animals as compared to C57BL/6 mice. Unfortunately, the linear regression does not fit the nesting data very well and thus it is more difficult to make the comparison of tumor volume and nesting score.

      The following text has been added to the results section.

      Given the impact of nociceptor neuron ablation on tumor growth, we wondered whether differences in tumor volume contributed to the behavioral differences we noted. Thus, the behavior data were graphed as a function of tumor volume (Supplemental Fig 4A, B). A simple linear regression model was used to fit the data. In the case of nesting scores, the linear regression did not fit the data points very well making it difficult to assess nesting scores at a given tumor volume (Supplemental Fig 4A). However, the linear regression model fit the time to interact data better. Here, the graph suggests that tumor volume did not influence behavior as at any given tumor volume the time to interact with the cookie is generally smaller in TRPV1-Cre::Floxed-DTA animals as compared to C57BL/6 animals (Supplemental Fig 4B).

      Reviewer #3:

      (1) The authors mention in their Discussion the need for additional experiments. Could they also include / comment on the potential impact on the anti-tumor immune system in their model?

      The following text has been added to the discussion:

      Neuro-immune interactions have been studied in the context of a variety of conditions including, but not limited to infection 109, inflammation 110,111, homeostasis in the gut 112-114, as well as neurological diseases115,116. Neuro-immune communications in the context of cancer and behavior have also been studied (e.g., sickness behavior, depression) 117-119 however, these studies did not assess these interactions at the tumor bed. Investigations into neuro-immune interactions occurring within primary malignancies which harbor nerves have shed light on these critical communications. In the context of melanoma, which is innervated by sensory nerves, we identified that release of the neuropeptide calcitonin gene related peptide (CGRP) induces immune suppression. This effect is mediated by CGRP binding to its receptor, RAMP1, which is expressed on CD8+ T cells 49. A study utilizing a different syngeneic model of oral cancer similarly found an immune suppressive role for CGRP 120-122. These studies demonstrate that neuro-immune interactions occur at the tumor bed. Our current findings indicating that tumor-infiltrating nerves connect to a circuit that includes regions within the brain suggest that neuro-immune interactions within the peripheral malignancy may contribute to the behavioral alterations we studied.

      (2) The authors mention the importance of inflammation contributing to pain in cancer but do not clearly highlight how this may play a role in their model. Can this be clarified?

      The following text has been added to the discussion section of the manuscript.

      Moreover, given that carprofen and buprenorphine decrease inflammation 104, their ability to restore normal nesting and cookie test behaviors (which require the use of the oral cavity where the tumor is located) suggests that inflammation at the tumor site contributed to the decline in these behaviors in vehicle-treated animals. Since both drugs were given systemically and each only partially restored wheel running, it suggests that systemic inflammation alone cannot fully account for the decline in wheel running seen in vehicle-treated animals. We posit that the inflammation- and pain-independent component of this behavioral decline is mediated via the transcriptional and functional alterations in the cancer-brain circuit.

      (3) The tumor model apparently requires isoflurane injection prior to tumor growth measurements. This is different from most other transplantable types of tumors used in the literature. Was this treatment also given to control (i.e., non-tumor) mice at the same time points? If not, can the authors comment on the impact of isoflurane (if any) in their model?

      Mice in all groups (tumor and non-tumor) were treated with isoflurane. This important detail has been added to the methods section.

      (4) The authors emphasize in several places that this is a male mouse model. They mention this as a limitation in the Discussion. Was there an original reason why they only tested male mice?

      The following text has been added in the discussion section:

      Head and neck cancer is predominantly a cancer in males; it occurs in males three times more often than in females 123, this disparity increases in certain parts of the world. While smoking cigarettes and drinking alcohol are risk factors for HPV negative head and neck squamous cell carcinoma, even males that do not smoke and drink are have a higher susceptibility for this cancer than females 124,125. Thus, our studies used only male mice. However, we do recognize that females also get this cancer. In fact, female patients with head and neck cancer, particularly oral cancer, report more pain than their male counterparts 126,127. These findings suggest that differences in tumor innervation exist in males and females.

      Therefore, another project in the lab has been to compare disease characteristics (including innervation and behavior) in male and female mice. The findings from this second study are the topic of a separate manuscript.

      Recommendations For The Authors:

      Reviewing editor:

      (1) Tumors can communicate with the brain via blood-borne agents from the tumor itself or immune cells that are activated by the tumor in addition to neurons that invade the tumor. The xia and malaise that accompanies some tumors can be mediated by direct innervation and/or the humoral factors because both can activate the same parabrachial pathway. This paper makes the case for the direct innervation being important but ignores the possibility of both being involved. The interesting observation that innervation supports tumor growth (perhaps via substance P) is troublesome because the slower appearance of behavioral consequences (Figures 4 & 5) could be attributed to the smaller tumor size. A nice control for humoral effects would be to implant the tumor cells someplace in the body where innervation does not occur (if possible) and then examine behavioral outcomes.

      In the course of several projects, we have implanted different tumor cell lines in different locations in mice (oral cavity, hind limb, flank, peritoneal cavity). In each location, tumor innervation occurs. This is not a phenomenon found only in mice as we completed an immunohistological survey of human cancers from different sites and found they are all innervated (PMID 34944001). These data are consistent with tumor and locally-released factors that recruit nerves to the tumor bed (PMID: 30327461)(PMID: 32051587)(PMID: 27989802). Thus, an implantation site that does not result in tumor innervation is currently unknown and likely does not exist.

      (2) The authors should address whether there is an inflammatory component in this tumor model.

      MOC2-7 tumors have been characterized as non-inflamed and poorly immunogenic 129-131.

      This information has been added to the methods section.

      (3) The RTX experiment in Figure 5 would be more compelling if the drug was injected directly into the tumor rather than injecting it in the flank, thus ablating all TRPV1-exressing neurons as in the genetic approach.

      While we agree with the reviewer that ablating the TRPV1-expressing neurons at the tumor site directly would be ideal, RTX treatment takes approximately one week for ablation to occur but a significant amount of inflammation is associated with this. Therefore, we wait a total of 4 weeks for the inflammation to resolve. By this time, tumors have generally reached sacrifice criteria. Thus, this approach would not enable the question to be answered Moreover, we are not aware of any studies in which RTX has been injected in the oral cavity or face. While RTX is utilized clinically to treat pain, it is typically administered intrathecally, epidurally or intra-ganglionically (PMID: 37894723).

      (4) The authors address affective aspects of pain but do not adequately address the sensory aspects, e.g., sensitivity to touch, heat and/or cold. They attribute the decrease in food disappearance (consumption) and nest building to oral pain, but it could be due to anhedonia and anorexia that can accompany tumor progression.

      Assaying for touch and heat/cold sensitivity in the oral cavity is a critical aspect of studying head and neck cancer that needs to be addressed. However, in rodents these assays are not trivial given that any touch/heat/cold in the area of the tumor (oral cavity) impacts the sensitive whiskers in that region which directly influence these assays. Thus, we have been refining assays (e.g., OPAD, facial von Frey) to address these important questions. The findings from these studies are beyond the scope of this manuscript.

      The reviewer makes a good point about anhedonia and anorexia. The following text has been added to the results section:

      Pain-induced anhedonia is mediated by changes in the reward pathway. Specifically, in the context of pain, dopaminergic neurons in the ventral tegmental area (VTA) become less responsive to pain and release less serotonin.  This decreased serotonin results in disinhibition of GABA release; the resulting increased GABA promotes an increased inhibitory drive leading to anhedonia  82 and, when extreme, anorexia. Carprofen and buprenorphine treatments completely reversed nesting behavior and significantly improved eating. Inflammation 83 and opioids 84 directly influence reward processing and though our tracing studies did not indicate that the tumor-brain circuit includes the VTA, this brain region may be indirectly impacted by tumor-induced pain in the oral cavity. Thus, an alternative interpretation of the data is that the effects of carprofen and buprenorphine treatments on nesting and food consumption may be due to inhibition of anhedonia (and anorexia) rather than, or in addition to, relieving oral pain.

      (5) Comment on why only males were used in this study.

      Please see response to public reviews.

      Reviewer #1:

      (1) Please provide a justification for the use of exclusively male mice and expand in the discussion if there is potential for these findings to be directly applicable to female mice as well.

      Please see response to public reviews.

      The following text has been added to the discussion:

      Head and neck cancer is predominantly a cancer in males; it occurs in males three times more often than in females 123, this disparity increases in certain parts of the world. While smoking cigarettes and drinking alcohol are risk factors for HPV negative head and neck squamous cell carcinoma, even males that do not smoke and drink are have a higher susceptibility for this cancer than females 124,125. Thus, our studies used only male mice. However, we do recognize that females also get this cancer. In fact, female patients with head and neck cancer, particularly oral cancer, report more pain than their male counterparts 126,127. These findings suggest that differences in tumor innervation exist in males and females.

      (2) When discussing the results shown in Figure 2, please include some mention of Fus, since it was the highest expressed transcript.

      The following text has been added to the results section regarding Fus.

      The gene demonstrating the highest increase in expression, Fus, was of particular interest; it increases in expression within DRG neurons following nerve injury and contributes to injury-induced pain 51,52. Of note, we purposefully used whole trigeminal ganglia rather than FACS-sorted tracer-positive dissociated neurons to avoid artificially imposing injury and altering the transcript levels of these cells 53,54. Thus, significantly elevated expression of Fus by ipsilateral TGM neurons from tumor-bearing animals suggests the presence of neuronal injury induced by the malignancy. This is consistent with our previous findings 55 and those of others 56 showing that tumor-infiltrating nerves harbor higher expression of nerve-injury transcripts and neuronal sensitization.

      (3) In line 197 please clarify the mice used. Were all mice tumor-bearing and some had nociceptors ablated, or was there a control (no tumor) group as well?

      Line 197 refers to Figure 4D. In this figure, panels B-D show quantification of cFos and DFosB in the spinal nucleus of the TGM (SpVc), The parabrachial nucleus (PBN) and the Central nucleus of the amygdala (CeA). These data are from C57BL/6 and TRPV1cre::DTAfl/wt animals all of whom had tumor. Supplementary Figure 3C also show quantification of cFos and DFosB but these are from control, non-tumor bearing animals. The fact that controls are non-tumor-bearing has been added to the supplemental figure legend and the text of the results section has been clarified as follows.

      While Fos expression was similar between non-tumor bearing mice of the two genotypes (Supplemental Fig. 3C-E), the absence of nociceptor neurons in tumor-bearing animals decreases cFos and DFosB in the PBN, and DFosB in the SpVc (Fig. 4B, C).

      (4) Overall it would improve the readability of the figures if the colors for the IHC channels were on the image itself and not exclusively in the figure legend.

      The colors for all the staining have been added to each panel.

      (5) It is not a problem that complete cartography was not done, but please include a justification for why the brain regions that were focused on were chosen.

      In order to ensure that our neural tracing technique captured only nerves present within the tumor bed, we restricted the injection of tracer to only 2 µl. We demonstrated that this small volume did not leak out of the tumor (Figure 1) and thus any tracer labeled neurons we identified were deemed as being connected in a circuit to nerves in the tumor bed. While we acknowledged that this calculated technical approach restricted our ability to tracer label all neurons in the tumor bed (as well as those they share circuitry with), it ensured no tracer leakage and inadvertent labeling of non-tumoral nerves. In non-tumor animals injected with 10 µl of tracer, labeled regions in the brain included the spinal nucleus of the trigeminal, the parabrachial nucleus, the central amygdala, the facial nucleus and the motor nucleus of the trigeminal. The regions that were tracer positive when tumor was injected were limited to the spinal nucleus of the trigeminal, the parabrachial nucleus and the central amygdala. Thus, the regions in the brain that we focused on were the areas that became tracer-positive following injection of tracer into the tumor.

      (6) Were the cells that were injected cultured in media with 10% fetal calf serum? If so was any inflammatory response seen? If not please state in the methods section the media that cells for injection were cultured in.

      The cells injected into animals were cultured in media containing 10% fetal calf serum. When cells are harvested for tumor injections, they are first washed two times with PBS and then trypsinized to detach the cells from the plate. Cells are collected, washed again with PBS and resuspended with DMEM without serum; this is what is injected into animals. We harvest cells in this way in order to eliminate any serum being injected into mice. This information has been added to the Methods section.

      (7) Would any of the differences in drug treatment (Carprofen vs Buprenorphine) be due to the differing routes of administration and metabolism of the drugs?

      Since carprofen and buprenorphine each resulted in similar behavioral impacts (nesting and wheel running), their different routes of administration seem to play a minor or no role in the behaviors assessed.

      (8) Please include in the methods section the specific approach and software that was used for processing calcium imaging data and calculating a relative change in fluorescence.

      The specific approach used for processing calcium imaging data and calculating relative change in fluorescence as well as the software used are all included in the methods section. Please see below:

      Ca2+ imaging. TGM neurons from non-tumor and tumor-bearing animals (n=4-6 mice/condition) were imaged on the same day. Neurons were incubated with the calcium indicator, Fluo-4AM, at 37°C for 20 min. After dye loading, the cells were washed, and Live Cell Imaging Solution (Thermo-Fisher) with 20 mM glucose was added. Calcium imaging was conducted at room temperature. Changes in intracellular Ca2+ were measured using a Nikon scanning confocal microscope with a 10x objective. Fluo-4AM was excited at 488 nm using an argon laser with intensity attenuated to 1%. The fluorescence images were acquired in the confocal frame (1024 × 1024 pixels) scan mode. After 1 min of baseline measure, capsaicin (300nM final concentration) was added. Ca2+ images were recorded before, during and after capsaicin application. Image acquisition and analysis were achieved using NIS-Elements imaging software. Fluo-4AM responses were standardized and shown as percent change from the initial frame. Data are presented as the relative change in fluorescence (DF/F0), where F0 is the basal fluorescence and DF=F-F0 with F being the measured intensity recorded during the experiment. Calcium responses were analyzed only for neurons responding to ionomycin (10 µM, positive control) to ensure neuronal health. Treatment with the cell permeable Ca2+ chelator, BAPTA (200 µM), served as a negative control.

      (9) Suggestions for Figure 1:

      - In Figures 1C, D, E, include labels for the days of tumor harvest.

      - Please make the size of the labels the same for 1K an 1L and align them.

      - Microscopy image in Figure 1L for SpVc looks like it may be at a different magnification.

      - If possible, include (either in the figure or the supplement) IHC images staining for Dcx and tau, which would complement the western blot data.

      The requested changes to the figures have been made. Unfortunately, we do not have Dcx and tau IHC staining of the day 4, 10 and 20 tumors.

      (10) Suggestions for Figure 2:

      - Include directly onto the graph in Figure 2a the legend for tumor-bearing (red) and non-tumor bearing (blue).

      - Keep consistent between Figure 2G and 2H/I if the tumor/nontumor will be labeled as T/N or Tumor/Control.

      The requested changes to the figures have been made.

      (11) Suggestions for Figure 3:

      - An example trace of calcium signal would complement Figure 3G, H well.

      Example tracings of calcium signal are already provided in Supplementary Figure 3A and B.

      Reviewer #2:

      (1) While the use of male mice is acknowledged, there is not a rationale for why female mice were not included in the study.

      Please see the response to Reviewer #1 (first question).

      (2) Criteria for euthanasia should be described in the Methods. This is especially needed for interpreting the survival curve in Figure 4H.

      Criteria for euthanasia in our IACUC approved protocol include:

      - maximum tumor volume of 1000mm3

      - edema

      - extended period of weight loss progressing to emaciation

      - impaired mobility or lesions interfering with eating, drinking or ambulation

      - rapid weight loss (>20% in 1 week)

      - weight loss at or more than 20% of baseline

      In addition to tumor size and weight loss, we use the body condition score to evaluate the state of animals and to determine euthanasia.  These details have been added to the Methods section.

      (3) At what stage in cancer progression were the Fos studies conducted for Figure 4A-D?

      The brains used for Fos staining (Fig 4B-D) were harvested at week 5 post-tumor implantation.

      (4) For Fos counts, what are the bregma coordinates for the sections that were quantified?

      SpVc:  -7.56 to -8.24mm

      PBN:  -4.96 to -5.52mm

      CeA:  -0.82mm to -1.94mm

      (5) Statistics are needed for the claim in Lines 171-173.

      The statistical analysis of Fos staining from tumor-bearing and non-tumor bearing brains are included in Figure 3D-F. The statistical analysis of ex vivo Ca+2 imaging of brains from tumor-bearing and non-tumor bearing animals are included in Figure 3 I and J.

      (6) How long was the baseline period for weight and food intake measurements? How long were the animals single-housed before taking the baseline measurements?  

      Baseline weight and food intake measurements were 2 weeks and animals were singly housed before baseline measurements for 2 weeks (a total of 4 weeks).

      Minor:

      (7) The authors might consider rewording the sentence on lines 59-62, given that it is abundantly clear from rodent studies that both the tumor and chemotherapy are associated with adverse behavioral outcomes.

      We have reworded the sentence as follows:  The association of cancer with impaired mental health is directly mediated by the disease, its treatment or both; these findings suggest that the development of a tumor alters brain functions.

      (8) Line 212 needs a space between the two sentences.

      This has been fixed.

      (9) Font size in Figure 2 is not consistent with the other figures.

      This has been fixed.

      (10) "DAPI" is the more conventional than "DaPi".

      This has been fixed.

      Editorial Comments and Suggestions:

      (1) The Abstract would be better if it were more concise, e.g. ~175 words.

      The abstract has been shortened as requested and now reads:

      Cancer patients often experience changes in mental health, prompting an exploration into whether nerves infiltrating tumors contribute to these alterations by impacting brain functions. Using a mouse model for head and neck cancer and neuronal tracing we show that tumor-infiltrating nerves connect to distinct brain areas. The activation of this neuronal circuitry altered behaviors (decreased nest-building, increased latency to eat a cookie, and reduced wheel running). Tumor-infiltrating nociceptor neurons exhibited heightened calcium activity and brain regions receiving these neural projections showed elevated cFos and delta FosB as well as increased calcium responses compared to non-tumor-bearing counterparts. The genetic elimination of nociceptor neurons decreased brain Fos expression and mitigated the behavioral alterations induced by the presence of the tumor. While analgesic treatment restored nesting and cookie test behaviors, it did not fully restore voluntary wheel running indicating that pain is not the exclusive driver of such behavioral shifts. Unraveling the interaction between the tumor, infiltrating nerves, and the brain is pivotal to developing targeted interventions to alleviate the mental health burdens associated with cancer.

      (2) Lines 28, 104, 258, 486, 521, and many other places, "utilized" should be "used" because the former refers to an application for which it is not intended, e.g. a hammer was utilized as a doorstop.

      The requested changes have been made.

      (3) Lines 32 and 73, it is not clear whether the basal activity is heightened or whether excitability is increased. "manifest" might be better than "harbor" on line 73.

      We have changed the wording in the abstract to be clearer. Moreover, our finding that TGM neurons from tumor-bearing animals have increased expression of the s1-Receptor and phosphorylated TRPV1 (Fig 2G-I) indicate that these neurons have increased excitability.

      (4) Line 34 and elsewhere, it would be better to refer to Fos because the is no need to distinguish cellular, cFos, from viral, vFos, in this context.

      The requested changes have been made.

      (5) Line 38, It would be better to refer to what was actually measured rather than "oral movements".

      The requested changes have been made. The sentence now reads: “While analgesic treatment restored nesting and cookie test behaviors, it did not fully restore voluntary wheel running.”

      (6) Line 84, CXCR3-null mouse on a C57BL/6 background.

      The requested change has been made.

      (7) Lines 86,129 wild-type, male mice.

      The requested change has been made.

      (8) Lines114-115, the brackets are not necessary.

      The requested change has been made.

      (9) Lines 118, 384, 409, 527, 589, 971, 974 always leave a space between numbers and units. Use Greek u for micro.

      The requested change has been made.

      (10) Lines 123-124, it is not clear that there is meaningful labeling within the CeA.

      We have replaced this image with a more representative one of the CeA from a tumor-bearing animal with clear tracer labeling.

      (11) Lines 125, 138, and 246 transcription was not measured, only transcript levels were measured.

      The requested changes have been made.

      (12) Line 133, I think >4 fold is meant.

      Thank you for catching that. I have fixed it to >4 fold.

      (13) Line 165, single-time-point assessment (add hyphens).

      The requested change has been made.

      (14) Line 181 and elsewhere including figure, the superscripts refer to alleles of the genes; hence approved gene names should be used in italics (as in Methods), TRPV1-Cre:: Floxed-DTA (without italics) would be acceptable.

      The requested changes have been made.

      (15) Line 182, nociceptor-neuron-ablated mice (add hyphens).

      The requested changes have been made.

      (16) Line 197, It is not clear that the "speed" of food disappearance was measured or that it is due to oral pain vs loss of appetite.

      The reviewer makes a good point. We have changed the sentence to read:

      To evaluate the effects of this disruption on cancer-induced behavioral changes, we assessed the animals’ general well-being through nesting behavior 32 and anhedonia using the cookie test 76,77, as well as  body weight and food disappearance as surrogates for oral pain and/or loss of appetite.

      (17) Line 199, The reduced tumor growth after ablation could account for most of the changes in the other parameters that were measured.

      We have graphed the nesting scores and time-to-interact with the cookie as a function of tumor volume.  These data are now included as Supplemental Figure 4 and suggest that at the same tumor volume, nesting scores and times-to-interact with the cookie are different between the groups.

      (18) Line 204 TPVP1 spelling. Is the TGN smaller after ablation of half of the neurons?

      The requested change has been made.

      (19) Line 235, "now" is not necessary.

      The requested change has been made.

      (20) Line 238-239 and elsewhere, a few references for to why the TGN-SpVc-PBN-CeA circuit is relevant would be helpful.

      The following references have been added regarding the relevance of this circuit to behavior:

      Molecular Brain 14: 94 (2021) (PMID 34167570)

      Neuropharmacology 198: 108757 (2021) (PMID 34461068)

      Frontiers in Cellular Neuroscience 16: 997360 (2022)  (PMID 36385947)

      Neuropsychopharmacology  49(3): 508-520 (2024) (PMID 37542159)

      (21) Lines 371, 434 and Figures, gm should be g or grams in scientific usage. Include JAX lab stock numbers for these mouse lines.

      The requested changes have been made.

      (22) Line 432, removing food for one hour is not a fast.

      The sentence has been reworded as follows: One hour prior to testing, mouse food is removed and the animals are acclimated to the brightly lit testing room.

      (23) Line 476, 5-um sections (add hyphen).

      The hyphen has been added.

      (24) Lines 988, and 1023, DAPI are usually shown this way.

      The requested change has been made.

      (25) Figure 1K, add Bregma levels to figures.

      SpVc: -8.12 mm

      PBN: -5.34 mm

      CeA: -1.34 mm

      (26) Figure 3 line 1033, "area under the curve" What curve was examined?

      The curve examined was the change in fluorescence over time. This curve has been added as Supplemental Figure 3C.

      (27) Figure 3B, the circled area is the lateral PBN. At first glance, I thought scp was meant as the label for the circled area.

      Scp is noted in the figure legend as a landmark.

    1. Author response:

      Data replicability

      There are no replicates contained in the manuscript. (Reviewer #1)

      We respectfully disagree with this statement. In this manuscript, we included both cell and animal replicates. For cell replicates, we analyzed over 50.000 cells using RNAscope and over 10.000 cells using RNAseq, employing two independent methods on different animals. We believe this extensive analysis is sufficient by any standards. Regarding animal replicates, we generated four different transgenic lines (two knockin lines and two BAC transgenic lines), which is an uncommon and rigorous effort. We analyzed dozens of animals, consistently observing the expression pattern of Smim32 and its derived transgenes across multiple experiments, including crosses between transgenics and various reporter lines, which is again an uncommon and rigorous effort. These experiments were conducted on animals from different litters to ensure robustness. Additionally, our longitudinal study, which includes 13 animals harvested at two-day intervals from E16 to P20, provides further consistency of our data. 

      However, to underscore the consistency of endogenous Smim32 expression, when submitting a revised manuscript, we will present Smim32 expression levels across individuals in single-cell RNA-seq data. Furthermore, we will pool data from different transgenic animals to demonstrate interindividual variability in the claustrum of adult animals. 

      Additional examples of female mice should also be included and separately quantified. (Reviewer #1)

      We initially analyzed both males and females for one line (the Smim32-Cre knock-in line). Since we observed no differences between males and females (which we will note in the revised manuscript), we subsequently limited our analyses to males to minimize the use of animals. 

      Claustrum definition

      Weaknesses lie in poor anatomical definitions of the claustrum (and endopiriform nucleus). (Reviewer #2)

      No other orthogonal approaches were used to define the claustrum, such as retrograde neuroanatomical tracing from cortex. (Reviewer #3)

      We share the reviewers’ opinion that the claustrum (CLA) and endopiriform nucleus (EN) are poorly defined anatomically in rodent brains due to the limited development of white matter tracts. This ambiguity has led to many conflicting descriptions of CLA/EN boundaries in various papers and atlases, including those by Paxinos and the Allen Brain Institute. Notably, the Allen Institute frequently updates the shape and anatomical location of the CLA/EN in their reference atlas, resulting in different websites displaying various versions (as illustrated in rebuttal figure 1 at comparable levels of the anteroposterior brain axis). It remains uncertain which version would most effectively satisfy the entire scientific community, if any. Indeed, after many years of working on these structures and surveying the literature, we regret to note that there is currently no consensus on the anatomical definition of the CLA and EN, even among expert laboratories using tracing or staining methods. At one end of the spectrum, some authors define the CLA as a small nucleus that could be, for example, characterized by the PVrich plexus. At the other end, other authors consider it part of a larger complex that includes the EN and extends dorsally to the S2 cortex. Additionally, differing definitions of the core and shell regions, as well as the precise anteroposterior extent of the nucleus, further complicate the issue.

      Author response image 1.

      Comparison of CLA and EN shapes in two recent versions of the Allen brain atlas

      Given this lack of consensus, we deliberately opted for a molecular definition of the claustrum and its projection neurons. We used a set of well-documented canonical markers for the claustrum and neighboring neurons to determine the expression pattern of Smim32. The claustrum-specific markers we selected (Nr4a2, Lxn, Gnb4, Car3, etc.) have been extensively studied and allow us to distinguish claustrum projection neurons from neighboring and intermingled populations. Although none of these individual markers are exclusively specific to CLA and EN neurons, the combined expression of these markers provides greater confidence in identifying the different neuronal populations in space.

      Smim32 expression is used to define claustrum anatomical boundaries, rather than first using several structural, molecular, and connectivity lines of evidence to define the claustrum anatomically and then to assess whether Smim32 expression fits within this anatomical definition. (Reviewer #2)

      Contrary to the reviewer's suggestion, we do not define the claustrum based on Smim32 expression. Instead, Figures 1 and 2 demonstrate that Smim32 expression is highly correlated with the expression of known claustrum markers (Nr4a2, Lxn, Gnb4, Car3, etc.), both regionally and at the cellular level. As suggested by Peng et al. (2021, Fig. 4 and Extended Data Fig. 11), this population of cells, which includes the claustrum, a specific subset of cells in cortical layer 6, and the dorsal endopiriform nucleus, forms a discrete group of neurons sharing the same transcriptomic identity. Given what is known about the connectivity of claustrum and endopiriform nucleus projection neurons, this population obviously includes neurons projecting to various areas, likely fulfilling distinct functions. Whether these cells should be subdivided based on projection area, developmental origin, or structural features is beyond the scope of this article.

      Specificity issues

      Cre/Flp expression driven by the Smim32 promoter is present in non-claustrum regions, including the neighboring cortex, striatum, and endopiriform nucleus as well as the more distant thalamic reticular nucleus. (Reviewer #2)

      The Smim32 gene is not specific to the claustrum. (Reviewer #3)

      We do not claim that endogenous Smim32 expression is exclusive to the claustrum or that the knock-in lines, by themselves, are sufficient to isolate claustrum neurons without combined approaches based on the transgenic lines presented here. However, there are significant differences in the expression pattern between endogenous Smim32 and the expression of Cre in the various derived transgenic lines, which might not have been clear in the current manuscript. Notably, there is no expression of Cre in the striatum and the thalamic reticular nucleus, and only sparse expression in the endopiriform nucleus in Tg61(Smim32-cre). Each transgenic line provides different levels of overlap with the endogenous Smim32 expression, with the Tg61(Smim32-cre)  line allowing for the most specific genetic access to claustrum neurons. Again, for greater specificity, any of these lines could be used in combined approaches, such as viral targeting (as shown in Figure 6A and B) or using transgenic intersectional (dual recombinase) approaches based on Cre- and Flp-expressing mice with an overlap in the claustrum, leading to circuit-specific and/or claustrum-only labeling.

      This means that our claims are supported by the observed data. However, we acknowledge that we may not have clearly explained the specificity of the random transgenes, which could have led some reviewers to believe that « the data do not support the claims ».

      We will clarify these points in the revised manuscript and include additional examples and quantifications to highlight the differences between endogenous Smim32 expression and Cre expression in the transgenic Tg61(Smim32-cre)  line.

      Regarding Cre-expressing cells in the neighboring cortex (layer 6 projection neurons), these cells are genetically distinct from other layer 6 cortical neurons and express the same canonical markers as claustrum projection neurons, likely sharing also the same transcriptomic identity. We will provide a more detailed characterization of these cells in the revised manuscript.

      Since Smim32 driven recombinase (in 61 or 62lrod) is not exclusively expressed in the claustrum, it is not clear how Smim32 is an advantage over possible Nr4a2 or, the more selective, GNB4 Cre driver lines. (Reviewer #2)

      Over the years, we have found a limited number of Cre lines used in the literature for targeting claustrum neurons. These include Gnb4-cre, Slc17a6-cre (also known as Vglut2-cre), Egr2-cre, Tg(Tbx21-cre), Ntng2-cre, Cux2-cre and Esr2-cre lines. We have not found any study describing and/or using an Nr4a2-cre line. Although a Nr4a2-Dre line exists (that we have studied in our laboratories), caution is warranted in its use, as it lacks the complete coding sequence of the Nr4a2 gene.

      One problem with Nr4a2 is its documented expression in the adjacent Layer 6b cortical neurons, which discards it as a suitable candidate to selectively target the claustrum. Furthermore, Nr4a2 is also expressed in a majority of the endopiriform nucleus neurons, whereas endogenous Smim32 is expressed in a smaller proportion of these cells, and is restricted mainly to the dorsal endopiriform nucleus. These reasons led us to select Smim32 over Nr4a2.

      Author response image 2.

      (A) In situ hybridization for various CLA/EN marker genes. (B) Developmental recombination observed outside the CLA/EN in various cre lines (all data from the Allen brain databases)

      What are the advantages of using the different Smim32-cre lines over the existing Cre lines mentioned above?

      Let’s first consider the Gnb4-cre line, which is considered one of the best available. Although the endogenous Gnb4 gene appears to have a similar expression pattern to Nr4a2, Slc17a6, and Smim32 in the striato-claustro-insular region of adult mice (Rebuttal Figure 2A), the results observed with the Gnb4-cre line either shows otherwise, or indicate that the Cre line does not fully recapitulate Gnb4 endogenous expression (Rebuttal Figure 3). Indeed some neurons in the insular cortex, piriform cortex, and putamen express the Cre recombinase (possibly due to low Gnb4 expression not detected in the in situ hybridization data of the Allen brain institute or due to nonspecific transgene expression) and will recombine viral vectors injected in adult mice (Rebuttal Figure 3). Therefore, this Cre expression outside the CLA/EN neurons in the Gnb4-cre line presents complications for data interpretation, depending on the viral injection coordinates and the quantity of injected vectors. 

      Author response image 3.

      Specificity of the Gnb4-Cre line tested with viral transduction in adult mice (all data from the Allen Brain Institute database). The top and middle rows display the same data but with different scaling of the lookup tables to highlight either the patterns of axonal projections (top) or the infected neurons themselves (middle). The bottom row shows a higher magnification of the infection site. Note that individual neurons cannot be resolved in experiment 485903475 due to signal saturation.  

      Cre expression in the CLA appears more specific in the various Smim32-cre transgenic lines than in many of the lines mentioned above. Although we have no doubt that the different existing transgenic lines can target CLA neurons, the selectivity of the targeting (for example, the fraction and types of CLA neurons versus potential non-CLA neurons) remains to be fully described for most of the lines. It is particularly true in the case of Tbx21 and Esr2 (used as drivers for the Tg(Tbx21-cre) and Esr2-cre transgenic lines). Tbx21 is not endogenously expressed in adult CLA neurons (evaluated by in situ and RNAseq data) and Egr2, if expressed in the claustrum, is not restricted to CLA neurons as it is an immediate early gene expressed in recently active neurons (Rebuttal Figure 2A). 

      Cre expression in the EN is observed in all Cre-expressing transgenic lines used to target the claustrum (with the exception of Slc17a6-cre). This can naturally be problematic for some approaches. Luckily, the random integrant Tg61(Smim32-cre) we describe in our manuscript shows a strong expression in the claustrum, and very limited expression outside the CLA (a very weak activity in the EN), representing a novel tool with improved claustrum selectivity. An advantage of the Tg61(Smim32-cre) over the Slc17a6-cre is that more CLA neurons can be targeted with the Tg61(Smim32-cre) line. 

      Another advantage of our four transgenic lines is their versatility; they can be used to recombine reporter lines as well as FRT-floxed and loxP-floxed knockouts in limited neuronal populations. They will be employed in the future for intersectional genetics to exclusively target CLA neurons. Existing transgenic lines cannot offer these possibilities because their marker genes are broadly expressed in the brain during embryogenesis, leading to the impact on a large number of non-CLA/EN neurons. This is evident in the Gnb4-cre and Slc17a6-cre lines crossed with the Ai14 reporter line expressing the fluorescent protein tomato (Rebuttal Figure 2B, right panels). Similar observations have been made for the Ntng2-Cre and Cux2-cre lines (see the Allen Brain Institute database for these data). Alternatively, inducible recombinase systems, such as the Gnb4-IRES2CreERT2-D line, could be used. However, the Gnb4-IRES2-CreERT2-D line requires tamoxifen to induce Cre recombination, which can be problematic depending on the research context, as well as recombinations in the absence of tamoxifen treatment (see experiments 560948627 and 560948194 in the Allen Brain database).

      It is unclear how Smim32 relates to claustrum in other mammalian species (e.g. primates) (Reviewer #3)

      As mentioned in the last paragraph of the introduction of the initial manuscript, Smim32 is specifically expressed in the claustrum of a primate species, Homo sapiens (reference 37 of the initially submitted manuscript).

      Availability of the transgenic mice

      These mice should be made available to the community through commercial vendors. (Reviewer #1 and #2 in private comments)

      We are pleased to see that two of the three reviewers would like to see these mice available. These mice will not be kept for ourselves, and we will distribute them at some point in time, but this will naturally occur after the publication of the revised manuscript.

      Critical comments on discussion and other topics

      A clear description of the search in the Allen Mouse Brain Atlas is missing. A search for Smim32 in the ISH mouse atlas did not provide any hits and so it would be useful to include in the methods or results section the exact query used for examination of Smim32 expression as well as other genes identified in this process. (Reviewer #2)

      Smim32 has been referred to by different names in various versions of the mouse genome. For the readers not versed in navigating genomes and annotations, before being officially named Smim32, this gene was originally called Gm6753 (as noted in the Allen Brain Institute database, see Rebuttal Figure 2A for an example of their in situ data) and later Gm45623.

      Several sentences highlighting the shortfalls of other approaches are overstated and should be toned down. (Reviewer #1)

      Very concerning is problematic language in the abstract and introduction sections that diminish the impact of several published studies (not cited) that have led to important findings regarding claustrum function. The authors create an argument that all the research performed thus far on the claustrum is unreliable because targeting the structure has been sub-optimal. (Reviewer #2)

      A more balanced discussion of the strengths and weaknesses of these mice should be included. (Reviewer #1)

      We regret if our choice of language inadvertently appeared to undermine the contributions of our colleagues; that was certainly not our intention. The paragraph in question was meant to address certain studies that we believe have led to inconsistent findings and unreliable data due to a lack of rigorous methodology in targeting claustrum projection neurons. To avoid singling out specific works, we chose not to cite them directly. We understand that some colleagues whose research does not fall under the “various cases” mentioned may feel unfairly targeted by this statement. We will revise this section to better clarify our intent and ensure it is respectful of all contributions. We will rephrase passages in the abstract, introduction, and discussion to provide a balanced view of the strengths and weaknesses of these mice.

      Our main goal is to provide tools to specifically target claustrum cells based on their transcriptomic identity, which we believe is the best means to assess the function of any neuronal population. Due to the intermingling of claustrum neurons with neighboring populations, employing stereotaxic injections in the claustrum without genetic segregation will always infect and label physically adjacent cells that do not belong to the claustrum, ontologically and functionally speaking. 

      Similarly, targeting claustrum neurons retrogradely by injecting into claustrum projection sites likely labels neurons from different populations. For instance, as reviewer 1 mentions Erwin et al. (2021), infecting retrosplenial projections without genetic specificity labels many claustrum Synpr+ neurons (considered the claustrum core), a small proportion of claustrum Nnat+ neurons (considered the claustrum shell by some, and non-claustrum neurons by others), and some neighboring cortical L6b neurons. These three populations have very different transcriptomic identities, connectivity patterns, and likely distinct functions.

      Thus, we believe that genetic specificity provides an important added value for selectively targeting the claustrum or claustro-insular complex.

      A better characterization of all data should be undertaken. (Reviewer #1)

      Having generated hundreds of transgenic lines over the years, we have never performed a more thorough analysis of transgenic lines, nor have a recollection of reading a publication evaluating at such a precise level the expression pattern of transgenes in mice. We, therefore, do not see exactly what the reviewer means by this remark. It is possible, not being native English speakers, that we did not grasp a certain form of joke.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      For the colony analysis, it is unclear from the methods and main text whether the initial individual sorted colonies were split and subject to different conditions to support the claim of bi-potency. The finding that 40% of colonies displayed tenogenic differentiation, may instead suggest heterogeneity of the sorted progenitor population. The methods as currently described, suggest that two different plates were subject to different induction conditions. It is therefore difficult to assess the strength of the claim of bi-potency.

      Thanks for your valuable comment. We are sorry for the confusing illustration of colony assay. In fact, we first obtained CD29+/CD56+ cells by FACs. Then these freshly isolated cells were randomly seeded to 96-well plate with density of 1 cell/well. Subsequently, the single cell in each plate was cultured with growth medium to form colonies for ten days. Then myogenic induction was performed in three 96-well plates and tenogenic induction was performed in another three 96-well plates for subsequent analyses. Thus, we agree with your point that the sorted progenitor population could be heterogeneous. Almost all the cells highly expressed myogenic progenitor genes PAX7/MYOD1/MYF5 (Figure 1g) and over 95% colonies successfully differentiated into myotubes (Figure 2g). Thus, we believe these obtained CD29+/CD56+ cells were myogenic progenitor cells, while a subgroup of these cells obtained bi-potency.

      This group uses the well-established CD56+/CD29+ sorting strategy to isolate muscle progenitor cells, however recent work has identified transcriptional heterogeneity within these human satellite cells (ie Barruet et al, eLife 2020). Given that they identify a tenocyte population in their human muscle biopsy in Figure 1a, it is critical to understand the heterogeneity contained within the population of human progenitors captured by the authors' FACS strategy and whether tenocytes contained within the muscle biopsy are also CD56+/CD29+.

      Thanks for your constructive suggestion. We will include more samples to perform scRNA-seq and reanalyze the data.

      The bulk RNA sequencing data presented in Figure 3 to contrast the expression of progenitor cells under different differentiation conditions are not sufficiently convincing. In particular, it is unclear whether more than one sample was used for the RNAseq analyses shown in Figure 3. The volcano plots have many genes aligned on distinct curves suggesting that there are few replicates or low expression. There is also a concern that the sorted cells may contain tenocytes as tendon genes SCX, MKX, and THBS4 were among the genes upregulated in the myogenic differentiation conditions (shown in Figure 3b).

      Thanks for your comment. Each group consisted of three samples for RNAseq analyses. We are sorry there exist a minor analysis mistake in Figure 3b and Figure 3c, which will be reanalyzed in the revised version. As for contamination of tenocytes, almost all the obtained cells highly expressed myogenic progenitor marker PAX7/MYOD1/MYF5 (Figure 1g-h). Low expression levels of tendon markers were identified in these cells (Figure 2a-c). Furthermore, although tendon genes slightly upregulated in myogenic differentiation conditions, these markers dramatically upregulated in tenogenic differentiation conditions (Figure 2c). Thus, we believe the tenogenic differentiation ability of sorted cells were mainly ascribed to CD29+/CD56+ myogenic progenitor cells.

      Reviewer #2 (Public Review):

      scRNAseq assay using total mononuclear cell population did not provide meaningful insight that enriched knowledge on CD56+/CD29+ cell population. CD56+/CD29+ cells information may have been lost due to the minority identity of these cells in the total skeletal muscle mononuclear population, especially given the total cell number used for scRNAseq was very low and no information on participant number and repeat sample number used for this assay. Using this data to claim a stem cell lineage relationship for MuSCs and tenocytes may not convincing, as seeing both cell types in the total muscle mononuclear population does not establish a lineage connection between them.

      Thanks for your constructive suggestion. We will include more samples to perform scRNA-seq and reanalyze the data.

      The TGF-b pathway assay uses a small molecular inhibitor of TGF-b to probe Smad2/3. The assay conclusion regarding Smad2/3 pathway responsible for tenocyte differentiation may be overinterpretation without Smad2/3 specific inhibitors being applied in the experiments.

      Thanks for your comment. We agree with your comment that we should revise it in the revision version.

      Reviewer #3 (Public Review):

      Comment: This dual differentiation capability was not observed in mouse muscle stem cells.

      Thanks for your comment. We have explored the tenogenic differentiation potential of mouse MuSCs both in vivo and in vitro. However, low tenogenic differentiation ability was revealed (Figure 4), which might be due to species diversity. Maybe it is more demanding for humans to maintain the homeostasis of the locomotion system and the whole organism locomotion ability in much longer life span and bigger body size. Thus, the current study also indicated that anima studies may not clinically relevant when investigating human diseases.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Mehrdad Kashefi et al. investigated the availability of planning future reaches while simultaneously controlling the execution of the current reach. Through a series of experiments employing a novel sequential arm reaching paradigm they developed, the authors made several findings: 1) participants demonstrate the capability to plan future reaches in advance, thereby accelerating the execution of the reaching sequence, 2) planning processes for future movements are not independent one another, however, it's not a single chunk neither, 3) Interaction among these planning processes optimizes the current movement for the movement that comes after for it.

      The question of this paper is very interesting, and the conclusions of this paper are well supported by data. However, certain aspects require further clarification and expansion.

      We thank reviewer one for their evaluation of the work.

      (1) The question of this study is whether future reach plans are available during an ongoing reach. In the abstract, the authors summarized that "participants plan at least two future reaches simultaneously with an ongoing reach and that the planning processes of the two future reaches are not independent of one another" and showed the evidence in the next sentences. However the evidence is about the relationship about ongoing reach and future plans but not about in between future plans (Line 52-55). But the last sentence (Line 55-58) mentioned about interactions between future plans only. There are some discrepancies between sentences. Could you make the abstract clear by mentioning interference between 1) ongoing movement and future plans and 2) in between future plans?

      We thank Reviewer for their comment. We have separated the longer sentence in the original abstract into two shorter ones. This should clarify that the two pieces of evidence pertain to the interaction of planning processes.

      (2) I understood the ongoing reach and future reaches are not independent from the results of first experiment (Figure 2). A target for the current reach is shown at Horizon 1, on the other hand, in Horizon 2, a current and a future target are shown on the screen. Inter-reach-interval was significantly reduced from H1 to H2 (Figure 2). The authors insist that "these results suggest that participants can plan two targets (I guess +1 and +2) ahead of the current reach (I guess +0)". But I think these results suggest that participants can plan a target (+1) ahead of the current reach (+0) because participants could see the current (+0) and a future target (+1) in H2. Could the authors please clarify this point?

      We thank Reviewer for raising this point. Our conclusion that “participants can plan two targets ahead of the current reach” is supported by the reduction in Inter-Response Interval (IRI) observed when comparing H2 to H3 in the 75 ms Dwell time condition. Specifically, on average, participants were 16 ms faster when they could see two future targets on the screen (H3) than when they could see only one (H2). To clarify this in the paper, we have revised the wording in line 124 to explicitly state that the conclusion pertains to the 75 ms Dwell time condition. Additionally, we emphasize that the strongest evidence for planning two future targets comes from the experiment shown in Figure 3.

      (3) Movement correction for jump of the +1 target takes longer time in H3 compared to H2 (Figure 4). Does this perturbation have any effect on reaching for +2 target? If the +1 jump doesn't affect reaching for +2 target, combined with the result that jump of the +2 target didn't affect the movement time of +1 target (Figure 3C), perturbation (target jump) only affects the movement directly perturbed. Is this implementation correct? If so, does these results support to decline future reaches are planned as motor chunk? I would like to know the author's thoughts about this.

      In the experiment presented in Figure 4, once we jumped the +1 target, the reach to that target was changed and participants replaned a corrective movement to the new location of the +1 target. This usually was followed by a longer-than-usual pause at the new location of +1 target for resuming the sequence and finishing the trial. Consequently, in these jump trials, it was impossible to compare the +2 reach to no-jump trials, as the normal sequence of movement was disrupted, and the reach to the +2 target originated from a different starting location. Nevertheless, we addressed the possibility that the two future reaches were planned as a chunk by the analysis shown in figure 5: There we showed that a displacement of the +2 target did not influence the reach to the +1 target, indicating that the movement plans could be updated independently.

      (4) Any discussion about Saccade position (Figure 7)?

      We thank reviewer 1 for this important comment. The following discussion section is added for the gaze position results.

      In our sequence task, participants switched their gaze location only once per reach, suggesting that information about the location of the next target is perceived parafoveally (Figure 7A). This observation aligns with previous studies (Clavagnier et al., 2007; González-Alvarez et al., 2007; Sivak and MacKenzie, 1990) that found participants keep their visual attention on the current sequence item and can perceive the location of spatial targets even when foveal vision is occluded. However, when comparing gaze locations for conditions Horizon >1, we observed that participants systematically biased their gaze location based on the sequence context. The gaze position shifted toward the next target, potentially allowing for more accurate location estimation (Figures 7C-D). Notably, changes in gaze location were observed even in Horizon 2, despite no changes in the curvature of hand movements in this horizon (Figure 6B). This suggests that information about the next target may first be available in the circuitry that controls eye movements and later in the cortical areas that control voluntary upper limb movements. Further control studies are required to investigate this hypothesis.

      Reviewer #2:

      Summary:

      In this work, Kashefi et al. investigate the planning of sequential reaching movements and how the additional information about future reaches affects planning and execution. This study, carried out with human subjects, extends a body of research in sequential movements to ask important questions: How many future reaches can you plan in advance? And how do those future plans interact with each other?

      The authors designed several experiments to address these questions, finding that information about future targets makes reaches more efficient in both timing and path curvature. Further, with some clever target jump manipulations, the authors show that plans for a distant future reach can influence plans for a near future reach, suggesting that the planning for multiple future reaches is not independent. Lastly, the authors show that information about future targets is acquired parafoveally--that is, subjects tend to fixate mainly on the target they are about to reach to, acquiring future target information by paying attention to targets outside the fixation point.

      The study opens up exciting questions about how this kind of multi-target planning is implemented in the brain. As the authors note in the manuscript, previous work in monkeys showed that preparatory neural activity for a future reaching movement can occur simultaneously with a current reaching movement, but that study was limited to the monkey only knowing about two future targets. It would be quite interesting to see how neural activity partitions preparatory activity for a third future target, given that this study shows that the third target's planning may interact with the second target's planning.

      Strengths:

      A major strength of this study is that the experiments and analyses are designed to answer complementary questions, which together form a relatively complete picture of how subjects act on future target information. This complete description of a complex behavior will be a boon to future work in understanding the neural control of sequential, compound movements.

      We thank the reviewer for their thorough reading of our work.

      Weaknesses:

      I found no real glaring weaknesses with the paper, though I do wish that there had been some more discussion of what happens to planning with longer dwell times in target. In the later parts of the manuscript, the authors mention that the co-articulation result (where reaches are curved to make future target acquisition more efficient) was less evident for longer dwell times, likely because for longer dwell times, the subject needs to fully stop in target before moving to the next one. This result made me wonder if the future plan interaction effect (tested with the target jumps) would have been affected by dwell time. As far as I can tell, the target jump portion only dealt with the shorter dwell times, but if the authors had longer dwell time data for these experiments, I would appreciate seeing the results and interpretations.

      We thank the reviewer for raising this point. In our time (Figure 2) and curvature analysis (Figure 6), we collected data with five levels of the horizon and three levels of dwell time to explore the space of parameters and to see if there is any interaction between dwell time and the horizon of planning the future targets. Apriori, we expected that the full stop in each target imposed by the 400 ms dwell time would be long enough to remove any effect of future targets on how the current move is executed. In line with our initial hypothesis, the systematic curvature of reaches based on the future target was smaller in longer dwell times (Figure 6E). Nevertheless, we observed a significant curvature even in 400 ms dwell time. Based on this observation, we expect running the jump experiments (Figures 4 and 5) in longer dwell times will lead to the same pattern of results but with a smaller effect size since longer dwells break the interdependence of sequence elements (Kalidindi & Crevecoeur, 2023). In the end, for the jump experiments, we limited our experimental conditions to the fastest dwell time (75 ms dwell) since we were conceptually interested in situations where movements in the sequence are maximally dependent on each other.

      Beyond this , the authors also mentioned in the results and discussion the idea of "neural resources" being assigned to replan movements, but it's not clear to me what this might actually mean concretely. I wonder if the authors have a toy model in mind for what this kind of resource reassignment could mean. I realize it would likely be quite speculative, but I would greatly appreciate a description or some sort of intuition if possible.

      Our use of the term "neural resources" is inspired by classic psychology literature on how cognitive resources such as attention and working memory are divided between multiple sequence components. Early studies on working memory suggest that human participants can retain and manipulate a fixed number of abstract items in working memory (Miller, 1956). However, more recent literature postulates that a specific number of items does not limit working memory, rather, it is limited by a finite attentional resource that is softly allocated to task items.

      Here we borrowed the same notion of soft distribution of resources for the preparation of multiple sequence items. A large portion of our observation in this paper and also previous work on sequence production can be explained by a simple model that assumes one central planning resource that is “softly” divided between sequence elements when participants see future items of the sequence (Author Response Image 1). The first sequence element receives the majority of the resources and is planned the most. The rest of the sequence receives the remaining planning resources in an exponentially decaying manner for preparation of the movement during the execution of the ongoing movement. Once the ongoing movement is over, the resource is then transferred to the next sequence item and this process is repeated until the sequence is over. Assignment of planning resources to future items explains why participants are faster when seeing future items (Figure 2). But this comes with a cost – if the ongoing movement is perturbed, the replanning process is delayed since some of the resources are occupied by future planning (Figure 4). This naturally leads to the question of how this resource allocation is implemented in neural tissue. To address this, we are conducting the same sequence task with the horizon in non-human primates (NHPs), and the investigation of these neural implementation questions will be the focus of future studies.

      Author response image 1.

      Basic diagram showing a soft distribution of a limited planning resource. The diagram shows a Horizon 3 condition in which two future reaches (+1 and +2) are planned while executing a movement (+0). The majority of resources is assigned to the execution of the ongoing movement while the reset is distributed for planning future movements. Once the movement is over, the chain of preparation and execution moves forward.

      Recommendations for the author:

      Reviewer #1

      We thank reviewer one for these comments regarding the clarity and consistency of figures and terminology.

      (1) Figure 3. Are "+1 Move" in Fig. 3B and "+ 1 Movement" in Fig. 3C as same as "E + 1" in Fig. 3A? Also does "Dwell" in Fig. 3B mean same as "+1 Dwell" in Fig. 3C? Consistent terminology would help readers to understand the figure.

      “+1 Move” in Figure 3B is the same as +1 movement in Figure 3C. “Dwell” in Figure 3B is the same as +1 Dwell in Figure 3C. We changed the figure for more consistency.

      (2) Figure 3. A type in the second last line in the legend, "pre-jump target for no-jump and jump and condition". The second "and" isn't necessary.

      The typo is corrected. Thank you.

      (3) Figure 4C. Is "Movement time" equivalent with "E + 1"?

      “Movement time” is equivalent to E+1 only in no-jump conditions. When the jump occurs,

      Movement time contains all the

      (4) Figure 6B. Is the gray circle in between the graph and target positions there by mistake?

      We fixed this typo. Thank you.

      (5) Figure 6E. It's hard to distinguish H2-H5 from the color differences.

      We changed the H5 to full white with a black stroke to improve the contrast. Thank you.

      (6) Figure 7A. Blue dots are almost invisible.

      We added a black stroke to blue circles for more visibility. Thank you.

      Reviewer #2

      I found this manuscript to be engaging and well written--many of the questions I had while reading were answered promptly in the next section. As such, my comments are mostly minor and primarily geared towards improving clarity in the manuscript.

      (1) One major recurring confusion I had while reading the manuscript was how to think about H1, H2, and H3. It was clearly explained in the text, and the explanations of the results were generally clear once I read through it all, but I found it strangely confusing at times when trying to interpret the figures for myself (e.g., in H2, 2 targets are on screen, but the second target can only be planned during the reach toward the first target). This confusion may just be me reading the manuscript over two days, but I wonder if it could be made clearer with some semantic iconography associated with each horizon added to the later figures alongside the H labels. As one option, perhaps the planning timeline part of Fig 1D could be simplified and shrunk down to make an icon for each horizon that clearly shows when planning overlaps for each horizon.

      (Please see the response to point #2 below)

      (2) Regarding Fig 1D: I like this figure, but it's unclear to me how the exact preparation and execution times are determined. Is this more of a general schematic of overlaps, or is there specific information about timing in here?

      We thank reviewer 2 for their important feedback. The role of Figure 1D was to summarize the timing of the experiments for different horizons. That is, to clarify the relative timing of the targets appearing on the screen (shown with a small circle above the horizontal line) and targets being captured by participants (the ticks and their associated number on the line). Execution is shown as the time interval that the hand is moving between the targets and planning is the potential planning time for participants from the target appearing on the screen until initiation of the reach to that target. We added the relevant parts of Figure 1D to the subplots for each subsequent experiment, to summarize the timing of other experiments and their analyses. For the experiments with target jump, a small vertical arrow shows the time of the target jump relative to other events.

      However, this figure will be less useful, if the connection between the timing dots and ticks is not communicated. We agree that in the original manuscript, this important figure was only briefly explained in the caption of Figure 1. We expanded the explanation in the caption of Figure 1 and referenced the dots and ticks in the main text.

      (3) Fig 6B - for some reason I got confused here: I thought the central target in this figure was the start target, and it took me embarrassingly long to figure out that the green target was the start target. This is likely because I'm used to seeing center-out behavioral figures. Incidentally, I wasn't confused by 7c (in fact, seeing 7c is what made me understand 6b), so maybe the solution is to clearly mark a directionality to the reach trajectories, or to point an arrow at the green target like in previous figures. Also, the bottom left gray target in the figure blends into the graph on the left--I didn't notice it until rereading. Because there's white space between that target and the green one, it might be good to introduce some white space to separate the graph from the targets more. The target arrangement makes more sense in panel C, but by the time I got there, I had already been a bit confused.

      Thanks for raising this point. As shown in Figure 6C, we used the reach to the +1 target for the curvature analysis. The confusion about Figure 6B is probably due to continuing the reach trajectories after the +1 target. That also explains why Figure 7C seemed more straightforward. To solve this issue we modified Figure 6B such that the reaches are shown with full opacity right until the +1 target and then shown with more transparency. We believe this change focuses the reader's attention to the reach initiated from the +0 target to the +1 target.

      As for the gray target in Figure 6B, we originally had the gray target as it is a potential start location for the reach to the +0 target, and for having similar visuals between the plots. The gray target is now removed from Figure 6B.

      (4) Line 253 - I'm not sure I understand the advantage over simple averaging that the authors mention here--would be nice to get a bit more intuition.

      Thanks for raising this point. We used a two-factor model in our analysis, with each factor representing the angle of the last and next target, respectively. Both factors had five levels: -120, -60, 0, 60, and 120 degrees relative to the +1 reach. In a balanced two-factor design, where each combination of factor levels has an equal number of trials, using a linear model and simple averaging would yield equivalent results. However, when the number of trials for the combinations of the two factors is unbalanced, simple averaging can lead to misleading differences in the levels of the second factor. Additionally, the linear model allows us to investigate potential interactions between the two factors, which is not possible with simple averaging.

      (5) Fig 7a - I would have liked to see the traces labeled in figure (i.e. hand trajectory vs. eye trajectory)

      Hand and eye trajectories are now labeled in the figure.

      (6) Fig 7c - very minor, but the hexagon of targets is rotated 30 degrees from all previous hexagons shown (also, this hex grid target arrangement can't lead to the trajectory shown in 7a, so it can't be that this was a different experimental grid). I'm guessing this was a simple oversight.

      We used the same grid in the eye-tracking experiment. The targets are to visually match the previous plots. Thank you for raising this point.

      Reference

      Clavagnier, S., Prado, J., Kennedy, H., & Perenin, M.-T. (2007). How humans reach: distinct cortical systems for central and peripheral vision. The Neuroscientist: A Review Journal Bringing Neurobiology, Neurology and Psychiatry, 13(1), 22–27.

      González-Alvarez, C., Subramanian, A., & Pardhan, S. (2007). Reaching and grasping with restricted peripheral vision. Ophthalmic & Physiological Optics: The Journal of the British College of Ophthalmic Opticians , 27(3), 265–274.

      Kalidindi, H. T., & Crevecoeur, F. (2023). Task dependent coarticulation of movement sequences (p.2023.12.15.571847). https://doi.org/10.1101/2023.12.15.571847

      Miller, G. A. (1956). The magical number seven plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.

      Sivak, B., & MacKenzie, C. L. (1990). Integration of visual information and motor output in reaching and grasping: the contributions of peripheral and central vision. Neuropsychologia, 28(10), 1095–1116.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Magnesium modulates phospholipid metabolism to promote bacterial phenotypic resistance to antibiotics", Li et al demonstrated the role of magnesium in promoting phenotypic resistance in V. alginolyticus. Using standard microbiological and metabolomic techniques, the authors have shown the significance of fatty acid biosynthesis pathway behind the resistance mechanism. This study is significant as it sheds light on the role of an exogenous factor in altering membrane composition, polarization, and fluidity which ultimately leads to antimicrobial resistance.

      Strengths:

      (1) The experiments were carried out methodically and logically.

      (2) An adequate number of replicates were used for the experiments.

      Weaknesses:

      (1) The introduction section needs to be more informative and to the point.

      (2) The weakest point of this paper is in the logistics through the results section. The way authors represented the figures and interpreted them in the results section (or the figure legends) does not match. The figures are difficult to interpret and are not at all self-explanatory.

      (3) There are too many mislabeling of the figure panels in the main text which makes it difficult to find out which figures the authors are explaining. There should be more explanation on why and how they did the experiments and how the results were interpreted.

      (1) We would like to extensive revise the introduction to make it more informative than the current version.

      (2) We will check the description in the text and labeling in the figures to make it is logic.

      (3) We will add the explanation of the experiments to make it clear that why we perform the assays.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors aimed to identify if and how magnesium affects the ability of two particular bacteria species to resist the action of antibiotics. In my view, the authors succeeded in their goals and presented a compelling study that will have important implications for the antibiotic resistance research community. Since metals like magnesium are present in all lab media compositions and are present in the host, the data presented in this study certainly will inspire additional research by the community. These could include research into whether other types of metals also induce multi-drug resistance, whether this phenomenon can be observed in other bacterial species, especially pathogenic species that cause clinical disease, and whether the underlying molecular determinants (i.e. enzymes) of metal-induced phenotypic resistance could be new antimicrobial drug targets themselves.

      Strengths:

      This study's strengths include that the authors used a variety of methodologies, all of which point to a clear effect of exogenous Mg2+ on drug resistance in the targeted species. I also commend the authors for carrying out a comprehensive study, spanning evaluation of whole cell phenotypes, metabolic pathways, genetic manipulation, to enzyme activity level evaluation. The fact that the authors uncovered a molecular mechanism underlying Mg2+-induced phenotypic resistance is particularly important as the key proteins should be studied further.

      Weaknesses:

      I believe there are weaknesses in the manuscript, however. The authors take for granted that the reader is familiar with all the assays utilized, and do not properly explain some experiments, and thus I highly suggest that the authors add a brief statement in each situation describing the rationale for each selected methodology (more details are in the private review to the authors). The Results section is also quite long and bogs down at times, and I suggest that the authors reduce its length by 10 to 20%. In contrast, the Introduction is sparse and lacks key aspects, for example, there should be mention of the study's main purpose and approaches, plus an introduction to the authors' choice of species and their known drug resistance properties, as well as the drug of choice (balofloxacin). Another notable weakness is that the authors evaluated Mg2+-induced phenotypic resistance only against two closely related species, and thus the generalizability of this mechanism of drug resistance is not known. The paper would be strengthened if the authors could demonstrate this type of phenotypic resistance in at least one more Gram-negative species and at least one Gram-positive species (antimicrobial susceptibility evaluations would suffice), each of which should be pathogenic to humans. Demonstrating magnesium-induced phenotypic drug resistance in the WHO Priority Bacterial Pathogens would be particularly important.

      We will add the explanation of the experiments to make it clear that why we perform the assays. And we will revise the introduction and shorten the length of the manuscript. Expanding the bacterial species is very good idea and we will perform such experiment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, Odenwald and colleagues show that mutant biotin ligases used to perform proximity-dependent biotin identification (TurboID) can be used to amplify signal in fluorescence microscopy and to label phase-separated compartments that are refractory to many immunofluorescence approaches. Using the parasite Trypanosoma brucei, they show that fluorescent methods such as expansion microscopy and CLEM, which require bright signals for optimal detection, benefit from the elevated signal provided by TurboID fusion proteins when coupled with labeled streptavidin. Moreover, they show that phase-separated compartments, where many antibody epitopes are occluded due to limited diffusion and potential sequestration, are labeled reliably with biotin deposited by a TurboID fusion protein that localizes within the compartment. They show successful labeling of the nucleolus, likely phase-separated portions of the nuclear pore, and stress granules. Lastly, they use a panel of nuclear pore-TurboID fusion proteins to map the regions of the T. brucei nuclear pore that appear to be phase-separated by comparing antibody labeling of the protein, which is susceptible to blocking, to the degree of biotin deposition detected by streptavidin, which is not. 

      Strengths: 

      Overall, this study shows that TurboID labelling and fluorescent streptavidin can be used to boost signal compared to conventional immunofluorescence in a manner similar to tyramide amplification, but without having to use antibodies. TurboID could prove to be a viable general strategy for labeling phase-separated structures in cells, and perhaps as a means of identifying these structures, which could also be useful. 

      Weaknesses: 

      However, I think that this work would benefit from additional controls to address if the improved detection that is being observed is due to the increased affinity and smaller size of streptavidin/biotin compared to IgGs, or if it has to do with the increased amount of binding epitope (biotin) being deposited compared to the number of available antibody epitopes. I also think that using the biotinylation signal produced by the TurboID fusion to track the location of the fusion protein and/or binding partners in cells comes with significant caveats that are not well addressed here, mostly due to the inability to discern which proteins are contributing to the observed biotin signal. 

      To dissect the contributions of the TurboID fusion to elevating signal, anti-biotin antibodies could be used to determine if the abundance of the biotin being deposited by the TurboID is what is increasing detection, or if streptavidin is essential for this.

      We agree with the reviewer, that it would be very interesting to distinguish whether the increase in signal comes from the multiple biotinylation sites or from streptavidin being a very good binder, or perhaps from both. However, this question is very hard to answer, as antibodies differ massively in their affinity to the antigen which is further dependent on the respective IF-conditions, and are therefore not directly comparible. Even if anti-biotin gives a better signal then anti-HA, this can be either caused by the increase in antigen-number (more biotin than HA-tag) or by the higher binding affinity, or by a combination of both, thus hard to distinguish. Nevertheless, we have tested monoclonal mouse anti-biotin targeting the (non-phase-separated) NUP158. We found the signal from the biotin-antibody to be much weaker than from anti-HA, indicating that, at least this particular biotin antibody, is not a very good binder in IF. 

      Alternatively, HaloTag or CLIP tagging could be used to see if diffusion of a small molecule tag other than biotin can overcome the labeling issue in phase-separated compartments. There are Halo-biotin substrates available that would allow the conjugation of 1 biotin per fusion protein, which would allow the authors to dissect the relative contributions of the high affinity of streptavidin from the increased amount of biotin that the TurboID introduces. 

      This is a very good idea, as in this case, the signals are both from streptavidin and are directly comparable. We expressed NUP158 with HaloTag and added PEG-biotin as a Halo ligand. However, PEG-biotin is poorly cell-permeable, and is in general only used on lysates. In trypanosomes, cell permeability is particular restricted, and even Halo-ligands that are considered highly cell-penetrant give only a weak signal. Even after over-night incubation, we could not get any signal with PEG-biotin. Our control, the TMR-ligand 647, gave a weak nuclear pore staining, confirming the correct expression and function of the HaloTag-NUP158.

      The idea of using the biotin signal from the TurboID fusion as a means to track the changing localization of the fusion protein or the location of interacting partners is an attractive idea, but the lack of certainty about what proteins are carrying the biotin signal makes it very difficult to make clear statements. For example, in the case of TurboID-PABP2, the appearance of a biotin signal at the cell posterior is proposed to be ALPH1, part of the mRNA decapping complex. However, because we are tracking biotin localization and biotin is being deposited on a variety of proteins, it is not formally possible to say that the posterior signal is ALPH1 or any other part of the decapping complex. For example, the posterior labeling could represent a localization of PABP2 that is not seen without the additional signal intensity provided by the TurboID fusion. There are also many cytoskeletal components present at the cell posterior that could be being biotinylated, not just the decapping complex. Similar arguments can be made for the localization data pertaining to MLP2 and NUP65/75. I would argue that the TurboID labeling allows you to enhance signal on structures, such as the NUPs, and effectively label compartments, but you lack the capacity to know precisely which proteins are being labeled.  

      We fully agree with the reviewer, that tracking proteins by streptavidin imaging alone is problematic, because it cannot distinguish, which protein is biotinylated. We therefore used words like “likely”  in the description of the data. However, we still think, it is a valid method, as long as it is confirmed by an orthogonal method. We have added this paragraph to the end of this chapter:

      “Importantly, tracking of proteins by streptavidin imaging requires orthogonal controls, as the imaging alone does not provide information about the nature of the biotinylated proteins. These can be proximity ligation assay, mass spectrometry or specific tagging visualisation of protein suspects by fluorescent tags. Once these orthogonal controls are established for a specific tracking, streptavidin imaging is an easy and cheap and highly versatile method to monitor protein interactions in a specific setting.”

      Reviewer #2 (Public Review): 

      Summary: 

      The authors noticed that there was an enhanced ability to detect nuclear pore proteins in trypanosomes using a streptavidin-biotin-based detection approach in comparison to conventional antibody-based detection, and this seemed particularly acute for phase-separated proteins. They explored this in detail for both standard imaging but also expansion microscopy and CLEM, testing resolution, signal strength, and sensitivity. An additional innovative approach exploits the proximity element of biotin labelling to identify where interacting proteins have been as well as where they are. 

      Strengths: 

      The data is high quality and convincing and will have obvious application, not just in the trypanosome field but also more broadly where proteins are tricky to detect or inaccessible due to phase separation (or some other steric limitations). It will be of wide utility and value in many cell biological studies and is timely due to the focus of interest on phase separation, CLEM, and expansion microscopy. 

      Thank you! We are glad you liked it.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors aimed to investigate the effectiveness of streptavidin imaging as an alternative to traditional antibody labeling for visualizing proteins within cellular contexts. They sought to address challenges associated with antibody accessibility and inconsistent localization by comparing the performance of streptavidin imaging with a TurboID-HA tandem tag across various protein localization scenarios, including phase-separated regions. They aimed to assess the reliability, signal enhancement, and potential advantages of streptavidin imaging over antibody labeling techniques. 

      Overall, the study provides a convincing argument for the utility of streptavidin imaging in cellular protein visualization. By demonstrating the effectiveness of streptavidin imaging as an alternative to antibody labeling, the study offers a promising solution to issues of accessibility and localization variability. Furthermore, while streptavidin imaging shows significant advantages in signal enhancement and preservation of protein interactions, the authors must consider potential limitations and variations in its application. Factors such as the fact that tagging may sometimes impact protein function, background noise, non-specific binding, and the potential for off-target effects may impact the reliability and interpretation of results. Thus, careful validation and optimization of streptavidin imaging protocols are crucial to ensure reproducibility and accuracy across different experimental setups. 

      Strengths: 

      - Streptavidin imaging utilizes multiple biotinylation sites on both the target protein and adjacent proteins, resulting in a substantial signal boost. This enhancement is particularly beneficial for several applications with diluted antigens, such as expansion microscopy or correlative light and electron microscopy. 

      - This biotinylation process enables the identification and characterization of interacting proteins, allowing for a comprehensive understanding of protein-protein interactions within cellular contexts. 

      Weaknesses: 

      - One of the key advantages of antibodies is that they label native, endogenous proteins, i.e. without introducing any genetic modifications or exogenously expressed proteins. This is a major difference from the approach in this manuscript, and it is surprising that this limitation is not really mentioned, let alone expanded upon, anywhere in the manuscript. Tagging proteins often impacts their function (if not their localization), and this is also not discussed.

      - Given that BioID proximity labeling encompasses not only the protein of interest but also its entire interacting partner history, ensuring accurate localization of the protein of interest poses a challenge. 

      - The title of the publication suggests that this imaging technique is widely applicable. However, the authors did not show the ability to track the localization of several distinct proteins on the same sample, which could be an additional factor demonstrating the outperformance of streptavidin imaging compared with antibody labeling. Similarly, the work focuses only on small 2D samples. It would have been interesting to be able to compare this with 3D samples (e.g. cells encapsulated in an extracellular matrix) or to tissues.  

      Recommendations for the authors:

      To enhance the assessment from 'incomplete' to 'solid', the reviewers recommend that the following major issues be addressed: 

      Major issues: 

      (1) Anti-biotin antibodies in combination with TurboID labeling should be used to compare the signal/labelling penetrance to streptavidin results. That would show if elevated biotin deposition matters, or if it is really the smaller size, more fluors, and higher affinity of streptavidin that's making the difference. 

      We agree with the reviewer, that it would be very interesting to distinguish whether the increase in signal comes from the multiple biotinylation sites or from streptavidin being a very good binder, or perhaps from both, and whether the size matters (IgG versus streptavidin). However, this question is very hard to answer, as antibodies differ massively in their affinity to the antigen. Thus, even if antibiotin would give a better signal then anti-HA, this could be either caused by the increase in antigen-number (more biotin than HA-tag) or by the better binding affinity, or by a combination, and it would not allow to truly answer the question. We have now tested anti-biotin antibodies, also in repsonse to reviewer 1, and got a much poorer signal in comparison to anti-HA or streptavidin.

      Please note that we made another attempt using nanobodies to target phase-separated proteins, to see, whether size matters (Fig. 2I). The nanobody did not stain Mex67 at the nuclear pores, but gave a weak nucelolar signal for NOG1, which may suggest that the nanobody can slightly better penetrate than IgG, but it does not rule out that the nanobody simply binds with higher affinity. Reviewer 1 has suggested to use the Halo Tag with PEG-biotin: this would indeed allow to directly compare the streptavidin signal caused by the TurboID with a single biotin added by the Halo tag. Unfortunately, the PEG-biotin does not  penetrate trypanosome cells. In conclusion, we are not aware of a method that would allow to establish why streptavidin but not IgGs can penetrate to phase separated areas. We therefore prefer to not overinterpret our data, but stick to what is supported by the data: “the inability to label phase-separated areas is not restricted to anti-HA but applies to other antibodies”.

      (3) Figure 4 A-B. The validity of claiming the correct localization demonstrated by streptavidin imaging comes into question, especially when endogenous fluorescence, via the fusion protein, remains undetectable (as indicated by the yellow arrow at apex). 

      In this figure, the streptavidin imaging does NOT show the correct localisation of the bait protein, but it does show proteins from historic interactions that have a distinct localisation to the bait. We had therefore introduced this chapter with the paragraph below, to make sure, the reader is aware of the limitations (which we also see as an opportunity, if properly controlled):

      “We found that in most cases, streptavidin labelling faithfully reflects the steady state localisation of a bait protein, e.g., the localisation resembles those observed with immunofluorescence or direct fluorescence imaging of GFP-fusion proteins. For certain bait proteins, this is not the case, for example, if the bait protein or its interactors have a dynamic localisation to distinct compartments, or if interactions are highly transient. It is thus essential to control streptavidin-based de novo localisation data by either antibody labelling (if possible) or by direct fluorescence of fusion-proteins for each new bait protein.”

      In particular, on lines 450-460, there's a fundamental issue with the argument put forward here. It is not possible to formally know that the posterior labeling is ALPH1 vs. another part of the decapping complex that was associated with PABP2-Turbo, or if the higher detection capacity of the Turbo-biotin label is uncovering a novel localization of the PABP2. While it is likely that it is ALPH1, it is not possible to rule out other possibilities with this approach. These issues should be discussed here and more generally the possibility of off-target labeling with this approach should be addressed in the discussion. 

      We fully agree with the reviewer, that tracking proteins by streptavidin imaging alone is problematic, because it cannot distinguish, which protein is biotinylated. We therefore used words like “likely”  in the description of the data. However, we still think, it is a valid method, as long as it is back-uped by an orthogonal method. We have added this paragraph to the end of this chapter:

      “Importantly, tracking of proteins by streptavidin imaging requires orthogonal controls, as the imaging alone does not provide information about the nature of the biotinylated proteins. These can be proximity ligation assay, mass spectrometry or specific tagging visualisation of protein suspects by fluorescent tags. Once these orthogonal controls are established for a specific tracking, streptavidin imaging is an easy and cheap and highly versatile method to monitor protein interactions in a specific setting.”

      (4) More discussion and acknowledgment of the general limitations in using tagged proteins are needed to balance the manuscript, especially if the hope is to draw a comparison with antibody labeling, which works on endogenous proteins (not requiring a tag). For example: (a) tagging proteins requires genetic/molecular work ahead of time to engineer the constructs and/or cells if trying to tag endogenous proteins; (b) tagged proteins should technically be validated in rescue experiments to confirm the tag doesn't disrupt function in the cell/tissue/context of interest; and (c) exogenous tagged proteins compete with endogenous untagged proteins, which can complicate the interpretation of data.  

      We have added this paragraph to the first paragraph of the discussion part:

      “Like many methods that are frequently used in cell- and molecular biology, streptavidin imaging is based on the expression of a genetically engineered fusion protein: it is essential to validate both, function and localisation of the TurboID-HA tagged protein by orthogonal methods. If the fusion protein is non-functional or mis-localised, tagging at the other end may help, but if not, this protein cannot be imaged by streptavidin imaging. Likewise, target organisms not amenable to genetic manipulation, or those with restricted genetic tools,  are not or less suitable for this method.”

      Also, we like to point out that for non-mainstream organisms like trypanosomes, antibodies are not commercially available and often genetic manipulation is more time-efficient and cheaper than the production of antiserum against the target protein.

      Also, the introduction would ideally be more general in scope and introduce the pros and cons of antibody labeling vs biotin/streptavidin, which are mentioned briefly in the discussion. The fact that the biotin-streptavidin interaction is ~100-fold higher affinity than an IgG binding to its epitope is likely playing a key role in the results here. The difference in size between IgG and streptavidin, the likelihood that the tetrameric streptavidin carries more fluors than a IgG secondary, and the fact that biotin can likely diffuse into phase-separated environments should be clearly stated. The current introduction segues from a previous paper that a more general audience may not be familiar with. 

      We have now included this paragraph to the introduction:

      “It remains unclear, why streptavidin was able to stain biotinylated proteins within these antibody inaccessible regions, but possible reasons are: (i) tetrameric streptavidin is smaller and more compact than IgGs (60 kDa versus a tandem of two IgGs, each with 150 kDa) (ii) the interaction between streptavidin and biotin is ~100 fold stronger than a typical interaction between antibody and antigen and (iii) streptavidin contains four fluorophores, in contrast to only one per secondary IgG.”

      Minor issues: 

      The copy numbers of the HA and Ty1 epitope tags vary depending on the construct being used. For example, Ty1 is found as a single copy tag in the TurboID tag, but on the mNeonGreen tag there are 6 copies of the epitope. It makes it hard to know if differences in detection are due to variations in copies of the epitope tags. Line 372-374: can the authors explain why they chose to use nanobodies in this case? It would be great to show the innate mNeonGreen signal in 2K to compare to the Ty1 labeling. The presence of 6 copies of the Ty1 epitope could be essential to the labeling seen here.

      We agree with the reviewer, that these data are a bit confusing. We have now removed Figure 3K, as it is the only construct with 6 Ty1 instead of one, and it does not add to the conclusions. (the mNeonsignal is entirely in the nucleolus, as shown by Tryptag). We have also added an explanation why we used nanobodies (“The absence of a nanobody signal rules out that its simply the size of IgGs that prevents the staining of Mex67 at the nuclear pores, as nanobodies are smaller than (tetrameric) streptavidin”). However, as stated above, we prefer not to overinterpret the data, as signals from different antibodies/nanobodies – antigen combinations are not comparable. Important to us was to stress that the absence of signal in phase-separated areas is NOT restricted to the anti-HA antibody, which is clearly supported by the data.

      What is the innate streptavidin background labeling look like in cells that are not carrying a TurboID fusion, from the native proteins that are biotinylated? That should be discussed. 

      We have now included the controls without the TurboID fusions for trypanosomes and HeLa cells: “Wild type cells of both Trypanosomes and human showed only a very low streptavidin signal, indicating that the signal from naturally biotinylated proteins is neglectable (Figure S8 in supplementary material).”

      Line 328-331: This is likely to be dependent on whether or not the protein moves to different localizations within the cell. 

      True, we agree, and we have added this paragraph:

      “The one exception are very motile proteins that produce a “biotinylation trail” distinct to the steady state localisation; these exceptions, and how they can be exploited to understand protein interactions, are discussed in chapter 4 below. “

      Line 304-305: Does biotin supplementation not matter at all? 

      No, we never saw any increase in biotinylation when we added extra biotin to trypanosomes. The 0.8 µM biotin concentration in the medium were sufficient.

      Line 326-327: Was the addition of biotin checked for enhancement in the case of the mammalian NUP98? I would argue that there is a significant number of puncta in Figure 1D that are either green or magenta, not both. The amount of extranuclear puncta in the HA channel is also difficult to explain. Biotin supplementation to 500 µM was used in mammalian TurboID experiments in the original Nature Biotech paper- perhaps nanomolar levels are too low. 

      We now tested HeLa cells with 500 µM Biotin and saw an increase in signal, but also in background; due to the increased background  we conclude that low biotin concentrations are more suitable . We have also repeated the experiment using 4HA tags instead of 1HA, and we found a minor improvement in the antibody signal for NUP88 (while the phase separated NUP54 was still not detectable). We have replaced the images in Figure 1D  (NUP88) and also in Figure 2F (NUP54) with improved images and using 4HA tags. However, we like to note that single nuclear pore resolution is beyond what can be expected of light microscopy.

      Line 371: In 2I, I see a signal that looks like the nucleus, similar to the Ty1 labeling in 2G, so I don't think it's accurate to say that that Mex67 was "undetectable". Does the serum work for blotting? 

      Thank you, yes, “undetectable” was not the correct phrase here. Mex67 localises to the nuclear pores, to the nuceoplasm and to the nucleolus (GFP-tagging or streptavidin). Antibodies, either to the tag or to the endogenous proteins, fail to detect Mex67 at the nuclear pores and also don’t show any particular enrichment in the nucleolus. They do, however, detect Mex67 in the (not-phase-separated) area of the nucleoplasm. We have changed the text to make this clearer. The Mex67 antiserum works well on a western blot (see for example: Pozzi, B., Naguleswaran, A., Florini, F., Rezaei, Z. & Roditi, I. The RNA export factor TbMex67 connects transcription and RNA export in Trypanosoma brucei and sets boundaries for RNA polymerase I. Nucleic Acids Res. 51, 5177–5192 (2023))

      Line 477: "lacked" should be "lagged".

      Thank you, corrected.

      Line 468-481: My previous argument holds here - how do you know that the difference in detection here is just a matter of much higher affinity/quantity of binding partner for the avidin?

      See answer to the second point of (3), above.

      483-491: Same issue - without certainty about what the biotin is on, this argument is difficult to make. 

      See answer to the second point of (3), above.

      Line 530: "bone-fine" should be "bonafide"

      Thank you, corrected.

      Line 602: biotin/streptavidin labeling has been used for expansion microscopy previously (Sun, Nature Biotech 2021; PMID: 33288959). 

      Thank you, we had overlooked this! We have now included this reference and describe the differences to our approach clearer in the discussion part:

      “Fluorescent streptavidin has been previously used in expansion microscopy to detect biotin residues in target proteins produced by click chemistry (Sun et al., 2021). However, to the best of our knowledge, this is the first report that employs fluorescent streptavidin as a signal enhancer in expansion microscopy and CLEM, by combining it with multiple biotinylation sites added by a biotin ligase. Importantly, for both CLEM and expansion, streptavidin imaging is the only alternative approach to immunofluorescence, as denaturing conditions associated with these methods rule out direct imaging of fluorescent tags.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment:

      This study presents valuable framework and findings to our understanding of the brain as a fractal object by observing the stability of its shape property within 11 primate species and by highlighting an application to the effects of aging on the human brain. The evidence provided is solid but the link between brain shape and the underlying anatomy remains unclear. This study will be of interest to neuroscientists interested in brain morphology, whether from an evolutionary, fundamental or pathological point of view, and to physicists and mathematicians interested in modeling the shapes of complex objects.

      We now clarified the outstanding questions regarding if our model outputs can be related to actual primate brain anatomy, which we believe was mainly based on comments regarding the validity of our output of apparently thicker cortices than nature can produce.

      We address this point in more detail in the point-by-point response below, but want to address this misunderstanding directly here: Our algorithm does not produce thicker cortices with increasing coarse-graining scales; in fact, the cortical thickness never exceeds the actual cortical thickness in our outputs, but rather thins with each coarse-graining scale. In other words, we believe that our outputs are fully in line with neuroanatomy across species.

      Reviewer #2 (Public Review): 

      In this manuscript, the authors analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between brains of young (~20 year old) and old (~80) humans. A challenge the authors acknowledge struggling with in reviewing the manuscript is merging "complex mathematical concepts and a perplexing biological phenomenon." This reviewer remains a bit skeptical about whether the complexity of the mathematical concepts being drawn from are justified by the advances made in our ability to infer new things about the shape of the cerebral cortex. 

      To allow scientists from all backgrounds to adopt these complex ideas, we have made our code to “melt” the brains and for further downstream analysis publicly available. We have now also provided a graphical user interface, to allow users without substantial coding experience to run the analysis. We also believe that the algorithmic concepts are easy to understand due to the similarity to the coarse-graining procedures found in long-standing and well-accepted box-counting algorithms.

      Beyond the theoretical insight of the fractal nature of cortices and providing an explicit and crucial link between vastly different brains that are gyrified and those that are not, we believe that the advance gained by our methods for future applications is clearly demonstrated in our proof-of-principle with a four-fold increase in effect size. For reference, an effect size of 8 would translate to an almost perfect separation of groups, i.e. an ideal biomarker with near 100% sensitivity and specificity.

      (1) The series of operations to coarse-grain the cortex illustrated in Figure 1 produces image segmentations that do not resemble real brains.

      As re-iterated in our Methods and Discussion: “Note, of course, that the coarse-grained brain surfaces are an output of our algorithm alone and are not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      Fig. 1 therefore serves as an explanation to the reader on the algorithmic outputs, but each melted brain is not supposed to be directly/visually compared to actual brains. Similar to algorithms measuring the fractal dimension, or the exposed surface area of a given brain, the intermediate outputs of these algorithms are not supposed to represent any biologically observed brain structures, but rather serve as an abstraction to obtain meaningful morphometrics.

      We additionally added a note to the caption of Fig. 1 to clarify this point:

      “Note that the actual size of the brains for analysis are rescaled (see Methods and Fig. 3); we display all brains scaled at an equal size here for the ease of visualisation of the method.”

      Finally, we also edited the entire paper for terminology to clearly distinguish the terms of (1) the cortex as a 3D object, (2) coarse-grained and voxelised versions thereof, and (3) summary morphological measures derived from the former. When we invite comparisons in our paper between real brains and coarse-grained brains, this is always at the level of summary morphological measures, not at the level of the 3D objects/voxelisations themselves.

      The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel. The reason for introducing this bias (and to the extent that it is present in the authors' implementation) is not provided.

      This detail was in the Supplementary, and we have now added additional clarification on this specific point to our Supplementary:

      “In detail, we assign all voxels in the grid with at least four corners inside the original pial surface to the pial voxelization. This process allows the exposed surface to remain approximately constant with increasing voxel sizes. A constant exposed surface is desirable, as we only want to gradually ‘melt’ and fuse the gyri, but not grow the bounding/exposed surface as well. We want the extrinsic area to remain approximately constant as we decrease the intrinsic area via coarse-graining; it is like generating iterates of a Koch curve in reverse, from more to less detailed, by increasing the length of smallest line segment.

      We then assign voxels with all eight corners inside the original white matter surface to the white matter voxelization. This is to ensure integrity of the white matter, as otherwise white matter voxels in gyri may become detached from the core white matter, and thus artificially increase white matter surface area. Indeed, the main results of the paper are not very sensitive to this decision using all eight corners, vs. e.g. only four corners, as we do not directly use white matter surface area for the scaling law measurements. However, we still maintained this choice in case future work wants to make use of the white matter voxelisations or derivative measures.”

      Note on the point of white matter integrity that if both grey and white matter voxelisations require all 8 corner to be inside the respective mesh, there will be voxels not assigned to either at the grey/white matter interface, causing potential downstream issues.

      We further acknowledge:

      “Of course, our proposed procedure is not the only conceivable way to erase shape details below a given scale; and we are actively working on related algorithms that are also computationally cheaper. Nevertheless, the current version requires no fine-tuning, is computationally feasible and conceptually simple, thus making it a natural choice for introducing the methodology and approach.”

      The authors provide an intuitive explanation of why thickness relates to folding characteristics, but ultimately an issue for this reviewer is, e.g., for the right-most panel in Figure 2b, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of typical mammalian neocortex. 

      We assume the reviewer refers to Fig. 1B with the panel on scale=4.9mm. We would like to point out that Fig. 1 serves as an explanation of the voxelisation method. For the actual analysis and Results, we are using re-scaled brains (see Fig. 2 with the ever decreasing brain sizes). The rescaling procedure is now expanded as below:

      “Morphological properties, such as cortical thicknesses measured in our ‘melted’ brains are to be understood as a thickness relative to the size of the brain. Therefore, to analyse the scaling behaviour of the different coarse-grained realisations of the same brain, we apply an isometric rescaling process that leaves all dimensionless shape properties unaffected (more details in Suppl. S3.1). Conceptually, this process fixes the voxel size, and instead resizes the surfaces relative to the voxel size, which ensures that we can compare the coarse-grained realisations to the original cortices, and test if the former, like the latter, also scale according to Eqn. (1). Resizing, or more precisely, shrinking the cortical surface is mathematically equivalent to increasing the box size in our coarse-graining method. Both achieved an erasure of folding details below a certain threshold. After rescaling, as an example, the cortical thickness also shrinks with increasing levels of coarse-graining, and never exceeds the thickness measured at native scale.”

      We additionally added a note to the caption of Fig. 1 to clarify this point:

      “Note that the actual size of the brains for analysis are rescaled (see Methods and Fig. 3); we display all brains scaled at an equal size here for the ease of visualisation of the method.”

      Finally, we also edited the entire paper for terminology to clearly distinguish the terms of (1) the cortex as a 3D object, (2) coarse-grained versions thereof, and (3) summary morphological measures derived from the former. When we invite comparisons in our paper between real brains and coarse-grained brains, this is always at the level of summary morphological measures, not at the level of the 3D objects themselves and their detailed anatomical features.

      (2) For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4b. It seems this additional spacing between gyri in 80 year olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20 year olds. The authors assert that K provides a more sensitive measure (associated with a large effect size) than currently used ones for distinguishing brains of young vs. old people. A more explicit, or elaborate, interpretation of the numbers produced in this manuscript, in terms of brain shape, might make this analysis more appealing to researchers in the aging field.

      We have removed the main results relating to K and aging from our last revision already to avoid confusion. This is now only in the supplementary analysis, and our claim of K being a more sensitive measure for age and ageing – whilst still true – will be presented in more detail in a series of upcoming papers.

      (3) In the Discussion, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. Given the lack of association between the abstract mathematical parameters described in this study and explicit properties of brain tissue and its constituents, it is difficult to envision how the coarse-graining operation can be used to guide development of "models of cortical gyrification."

      We have clarified in more detail what we meant originally in Discussion:

      “Finally, this dual universality is also a more stringent test for existing and future models of cortical gyrification mechanisms at relevant scales, and one that moreover is applicable to individual cortices. For example, any models that explicitly simulate a cortical surface as an output could be directly coarse-grained with our method and the morphological trajectories can be compared with those of actual human and primate cortices. The simulated cortices would only be ‘valid’ in terms of the dual universality, if it also produces the same morphological trajectories.”

      However, we agree with the reviewer that our paper could be misread as demanding direct comparisons of each coarse-grained brain with an actual brain, and we have now added the following text to clarify that this is not our intention for the proposed method or outputs.

      “Note, we do not suggest to directly compare coarse-grained brain surfaces with actual biological brain surfaces. As we noted earlier, the coarse-grained brain surfaces are an output of our algorithm alone and not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      Indeed, the dual universality imposes restrictive constraints on the possible shapes of real cortices, but do not fully specify them. Presumably, the location of individual folds in different individuals and species will depend on their respective evolutionary histories, so there is no reason to expect a match in fold location between the ‘melted’ cortices of more gyrified species, on one hand, and the cortex of a less-gyrified one, on the other,  even if their global morphological parameters and global mechanism of folding coincide.

      (4) There are several who advocate for analyzing cortical mid-thickness surfaces, as the pial surface over-represents gyral tips compared to the bottoms of sulci in the surface area. The authors indicate that analyses of mid-thickness representations will be taken on in future work, but this seems to be a relevant control for accepting the conclusions of this manuscript.

      In the context of some applications and methods, we agree that the mid-surface is a meaningful surface to analyse. However, in our work, the mid-surface is not. The fractal estimation rests on the assumption that the exposed area hugs the object of interest (hence convex hull of the pial surface), as the relationship between the extrinsic and intrinsic areas across scales determine the fractal relationship (Eq. 2). If we used the mid-surface instead of the pial surface for all estimation, this would not represent the actual object of interest, and it is separated from the convex hull. Estimating a new convex hull based on the mid surface would be the equivalent of asking for the fractal dimension of the mid-surface, not of the cortical ribbon. In other words, it would be a different question, bound to yield a different answer.

      Hence, we indicated in our original response that we only have a provisional answer, but more work beyond the scope of this paper is required to answer this question, as it is a separate question. The mid-surface, as a morphological structure in its own right, will have its own scaling properties, and our provisional understanding is that these also yield a scaling law parallel to those of the cortical ribbon with the same or a similar fractal dimension. But more systematic work is required to investigate this question at native scale and across scales.

      Reviewer #3 (Public Review):

      Summary: Through a rigorous methodology, the authors demonstrated that within 11 different primates, the shape of the brain followed a universal scaling law with fractal properties. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependant effect of aging on the human brain. 

      Strengths: 

      - New hierarchical way of expressing cortical shapes at different scales derived from previous report through implementation of a coarse-graining procedure 

      - Investigation of 11 primate brains and contextualisation with other mammals based on prior literature 

      - Proposition of tool to analyse cortical morphology requiring no fine tuning and computationally achievable 

      - Positioning of results in comparison to previous works reinforcing the validity of the observation. 

      - Illustration of scale-dependance of effects of brain aging in the human. 

      Weaknesses: 

      - The notion of cortical shape, while being central to the article, is not really defined, leaving some interpretation to the reader 

      - The organization of the manuscript is unconventional, leading to mixed contents in different sections (sections mixing introduction and method, methods and results, results and discussion...). As a result, the reader discovers the content of the article along the way, it is not obvious at what stages the methods are introduced, and the results are sometimes presented and argued in the same section, hindering objectivity. 

      To improve the document, I would suggest a modification and restructuring of the article such that: 1) by the end of the introduction the reader understands clearly what question is addressed and the value it holds for the community, 2) by the end of the methods the reader understands clearly all the tools that will be used to answer that question (not just the new method), 3) by the end of the results the reader holds the objective results obtained by applying these tools on the available data (without subjective interpretations and justifications), and 4) by the end of the discussion the reader understands the interpretation and contextualisation of the study, and clearly grasps the potential of the method depicted for the better understanding of brain folding mechanisms and properties. 

      We thank this reviewer again for their attention to detail and constructive comments. We have followed the detailed suggestions provided by us in the Recommendations For The Authors, and summarise the main changes here:

      - We have restructured all sections to be more clearly following Introduction, Methods, Results, and Discussion; by using subsections, we believe the structure is now more accessible to readers.

      -  We have now clarified the concept of “cortical shape”, as we use it in our paper in several places, by distinguishing clearly the object of study, and the morphological properties measured from it.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): None 

      Reviewer #3 (Recommendations For The Authors): 

      I once again compliment the authors for their elegant work. I am happy with the way they covered my first feedback. My second review takes into account some comments made by other reviewers with which I agree. 

      We thank this reviewer again for their attention to detail and constructive comments.

      Recommendations for clarifications: 

      General comments: The purpose of the article could be made clearer in the introduction. When I differentiate results from discussion, I think of results as objective measures or observations, while discussion will relate to the interpretation of these results (including comparison with previous literature, in most cases). 

      We have restructured all sections to be more clearly following Introduction, Methods, Results, and Discussion; by using subsection, we believe the structure is now more accessible to readers.

      - l.39: define or discuss "cortical shape" 

      We have gone through the entire paper and corrected for any ambiguities. We specifically distinguish between the cortex as a structure overall, shape measures derived from this structure, and coarse-grained versions of the structure.

      - l.48-74: this would match either an introduction or a discussion rather than a methods section. 

      Done

      - l.98-106: this would match a discussion rather than a methods section. 

      Done

      - l.111: here could be a good spot to discuss the 4 vs 8 corners for inclusion of pial vs white matter voxelization 

      We have discussed this in the more detailed Supplementary section now, as after restructuring, this appears to be the more suitable place.

      - l.140-180: it feels that this section mixes methods, results and discussion of the results 

      We agree and we have resolved this by removing sentences and re-arranging sections.

      - l.183-217: mix of results and discussion 

      We agree and we have resolved this by removing sentences and re-arranging sections.

      Small cosmetic suggestions: 

      - l.44: conservation of 'some' quantities: vague 

      Changed to conservation of morphological relationships across evolution

      - l.66: order of citations ([24, 22,23]) 

      Will be fixed at proof stage depending on format of references.

      - l.77: delete space between citation and period 

      Done

      - l.77: I would delete 'say' 

      Done

      - l.86: 'but to also analyse' -> 'to analyse' 

      Done

      - l.105: remove 'we are encouraged that' 

      Done

      - l.111: 'also see' -> 'see also' 

      Done

      - l.164: 'remarkable': subjective 

      Done

      - l.189: define approx. abbreviation 

      Done

      - l.190: 'approx' -> 'approx.' 

      Revised

      - l.195: 'dramatic': subjective 

      removed

      -l. 246: 'much' -> vague 

      explained

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors report compound heterozygous deleterious variants in the kinase domains of the non-receptor tyrosine kinases (NRTK) TNK2/ACK1 in familial SLE. They suggest that ACK1 and BRK deficiencies are associated with human SLE and impair efferocytosis.

      Strengths: 

      The identification of similar mutations in non-receptor tyrosine kinases (NRTKs) in two different families with familial SLE is a significant finding in human disease. Furthermore, the paper provides a detailed analysis of the molecular mechanisms behind the impairment of efferocytosis caused by mutations in ACK1 and BRK.

      Weaknesses: 

      A critical point in this paper is whether the loss of function of ACK1 or BRK contributes to the onset of familial SLE. The authors emphasize that inhibitors of ACK1/BRK worsened IgG deposition in the kidneys in a pristane-induced SLE model, which contributes not to the onset but to the exacerbation of SLE, thus only partially supporting their claim.

      The evidence supporting that the loss of function of ACK1 or BRK contributes to the onset of SLE in the patients from the 2 families mostly relies on the genetic analysis. As the reviewer states, the observation that inhibitors of ACK1/BRK worsened IgG deposition in the kidneys in a pristane-induced SLE model supports the genetic evidence.

      To further address the possible role of ACK1 or BRK variants in the onset of autoimmunity in vivo, we treated wild-type (WT) BALB/cByJ female mice with inhibitors in the absence of pristane.

      The results indicated that mice that had received a weekly injection of ACK1 or BRK inhibitors developed a large array of serum anti-nuclear IgG antibodies, including but not limited to autoantibodies associated with SLE such as anti-histones, anti-chromatin, anti U1-snRNP, anti-SSA, and anti-Ku in comparison to the control group inhibitor treated mice (Revised Fig 3A). However, they did not develop glomerular deposit of IgG after 12 weeks of treatment, in contrast to mice that have received Pristane (Revised Fig. 3B,C, Figure 3-figure supplement 1).

      These additional data suggests that inhibition of ACK1 and BRK stimulates the production of serum autoantibodies, which strengthen the claim that ACK1 and BRK kinase deficiency contribute to autoimmunity in BALB/cByJ.

      Reviewer #2 (Public Review):

      Summary: 

      In this manuscript, the authors revealed that genetic deficiencies of ACK1 and BRK are associated with human SLE. First, the authors found that compound heterozygous deleterious variants in the kinase domains of the non-receptor tyrosine kinases (NRTK) TNK2/ACK1 in one multiplex family and PTK6/BRK in another family. Then, by an experimental blockade of ACK1 or BRK in a mouse SLE model, they found an increase in glomerular IgG deposits and circulating autoantibodies. Furthermore, they reported that ACK and BRK variants from the SLE patients impaired the MERTK-mediated anti-inflammatory response to apoptotic cells in human induced pluripotent stem cells (hiPSC)-derived macrophages. This work identified new SLE-associated ACK and BRK variants and a role for the NRTK TNK2/ACK1 and PTK6/BRK in efferocytosis, providing a new molecular and cellular mechanism of SLE pathogenesis.

      Strengths: 

      This work identified new SLE-associated ACK and BRK variants and a role for the NRTK TNK2/ACK1 and PTK6/BRK in efferocytosis, providing a new molecular and cellular mechanism of SLE pathogenesis.

      Weaknesses: 

      Although the manuscript is well-organized and clearly stated, there are some points below that should be considered:

      In this study, the authors used forward genetic analyses to identify novel gene mutations that may cause SLE, combined with GWAS studies of SLE. To further explore the importance of these variants, haplotype analysis of two candidate genes could be performed, to observe the evolution and selection relationship of candidate genes in the population (UK 1000 biobank, for example). 

      To investigate whether ACK1/TNK2 or BRK/PTK6 were subject to selection, we gathered data using different metrics quantifying negative selection in the human genome. We collected the f parameter from SnIPRE1, lofTool2, and evoTol3, as well as intraspecies metrics from RVIS4, LOEUF5, and pLI6 (including pRec). We also used our in-house CoNeS metric7. None of these indicators suggest that the genes are under strong negative selection (Revised Figure 2-figure supplement 2). This is consistent with the deficiency being recessive. We also tested the variants with a MAF greater than 0.005. We found them to be neutral. We therefore did not test whether they were associated with any phenotype in the UK Biobank.

      Although the authors focused on SLE and macrophage efferocytosis in their studies, direct evidence of how macrophage efferocytosis significantly affects SLE is lacking. This point should at least be explicitly introduced and discussed by citing appropriate literature.

      We provide a more detailed description of the role of macrophage efferocytosis in autoimmunity and SLE in the revised manuscript. Specifically, we state (in the results section, paragraph: ACK1 and BRK kinase domain variants may lose the ability to link MERTK to RAC1, AKT and STAT3 activation for efferocytosis): “NRTKs such as ACK1 8 and PTK2/FAK 9 are also downstream targets of the TAM family receptor MERTK which is expressed on macrophages and controls the anti-inflammatory engulfment of apoptotic cells, a process known as efferocytosis 10-12. Efferocytosis allows for the clearance of apoptotic cells before they undergo necrosis and release intracellular inflammatory molecules, and simultaneously leads to increased production of anti-inflammatory molecules (TGFb, IL-10, and PGE2) and a decreased secretion of proinflammatory cytokines (TNF-alpha, IL-1b, IL-6) 10-14. In line with these findings, mice deficient in molecular components used by macrophages to efficiently perform efferocytosis, such as MFG-E8, MERTK, TIM4, and C1q, develop phenotypes associated with autoimmunity10,11,14-27. Furthermore, defects in efferocytosis are also observed in patients with SLE and glomerulonephritis14,28-31.“

      It is still not clear how the target molecules identified in this paper may influence macrophage efferocytosis. More direct evidence should be established. 

      Our studies show that wt -but not variants- of ACK1 and BRK are activated by MERTK, a key receptor that mediates the recognition of apoptotic cells. Our studies also show that wt -but not variants- activate RAC1 which is necessary for engulfment and phosphorylate AKT and STAT3 which are involved in the anti-inflammatory response to PtdSer recognition.

      The TAM family receptor MERTK mediates recognition of PtdSer on apoptotic cells via GAS6 and Protein S 10,15,32 leading to their engulfment, which involves activation of RAC1 for actin reorganization and the formation of a phagocytic cup 9,33. Using IP kinase assays we show that MERTK and GAS6 can activate the kinase activity of wild-type ACK1 8 or BRK but not of the patient’s ACK1 or BRK variant alleles (Figure 4D). To further support the role of ACK1 and BRK downstream from PtdSer recognition and uptake of apoptotic cells, we show that reference ACK1 and BRK alleles, in contrast to the patient variant alleles, can activate RAC1 to generate RAC-GTP which is necessary for engulfment 9,33 (Figure 4C).

      PtdSer recognition also typically stimulates an anti-inflammatory process mediated in part via AKT 34 and STAT3 and their target genes such as SOCS3 35-41 and results in the inhibition of LPS-mediated production of inflammatory mediators such as TNF and IL-1b, and the production of cytokines such as IL-10, TGFb 11,25-27,42. Consistent with this literature and the findings of the paper, we show that reference ACK1 and BRK, unlike the patient’s variant alleles, can phosphorylate AKT and STAT3 (Figure 4A, B). The role of ACK1 and BRK in these signaling pathways is further supported by our transcriptomics data comparing the response of controls, patients, and inhibitor-treated iPSC-derived macrophages to apoptotic thymocytes by RNA-seq. Specifically, we show Transcriptional repressors including the AKT targets ATF3, TGIF1, NFIL3, and KLF4, the STAT3 targets SOCS3 and DUSP5, as well as CEBPD and the inhibitor of E-BOX DNA Binding ID3 were among the top-ten genes which expression is induced by apoptotic cells in WT macrophages (Figure 4F), but this regulation was lost in mutant and inhibitor-treated macrophages (Figure 4F).

      For some transcriptional repressors mentioned in their studies, the authors should check whether there is clear experimental evidence. If not, it is recommended to supplement the experimental verifications for clarity.

      Transcriptional repressors including the AKT targets ATF3, TGIF1, NFIL3, and KLF4, the STAT3 targets SOCS3 and DUSP5, as well as CEBPD and the inhibitor of E-BOX DNA Binding ID3 were among the top-ten genes which expression is induced by apoptotic cells in WT macrophages (Figure 4F), but this regulation was lost in mutant and inhibitor-treated macrophages (Figure 4F).

      In the manuscript we cited published evidence, to the best of our knowledge, for the role of these genes in the regulation of inflammatory responses. Specifically we state: “ATF3, TGIF1, NFIL3, and KLF4 are involved in the negative regulation of inflammation in macrophages 35-38, SOCS3 is an inhibitor of the macrophage inflammatory response and DUSP5 is a negative regulator of ERK activation 39,40,43. These data suggest that the kinase domain of ACK1 and BRK contribute to the macrophage anti-inflammatory gene expression program driven by apoptotic cells.”

      In Figures 4C and 4D, it is seen that the usage of inhibitors causes cytoskeletal changes, however this reviewer would not have expected such large change. Did the authors check whether the cells die after heavy treatment by the inhibitors?

      We carefully examine the viability of Isogenic WT, BRK and ACK1 mutant macrophages (left panel) and of WT macrophages treated with ACK1 or BRK inhibitors and we did not observed changes in viability (Figure 4-figure supplement 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A crucial step in the development of SLE is the production of autoantibodies. It is shown in Figure 2F that inhibitors of ACK1/BRK enhanced the production of autoantibodies against histones and SSA in a pristane-induced SLE model, which is a significant result that could support the authors' claim. Strangely, this autoantigen panel does not include double-stranded DNA, RNP, or Sm, which should be presented regarding antibody production.

      We thank the reviewer for this comment. In the revised manuscript (Revised Figure 3 – Supplement 1) we added the remainder of the autoantibody panel, which includes double-stranded DNA, RNP, and Sm autoantibody levels. We also added the results for serum IgG autoantibody levels in BALB/cByJ mice treated for three months with DMSO, ACK1, or BRK inhibitors but did not receive a pristane injection (Revised Figure 3A). This data shows that mice which received ACK1 or BRK inhibitors had increased serum IgG autoantibodies in comparison to DMSO treated controls.

      Additionally, if there is information that inhibitors of ACK1/BRK promote the differentiation of follicular helper T cells, memory B cells, and plasma cells in a pristane-induced SLE model, it could be considered indirect evidence supporting the authors' claims.

      These are not available at present to the best of our knowledge.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      * In the literature, unpaired t-tests and ordinary one-way ANOVA (Tukey's multiple comparisons test) were used for statistical analysis, which requires data to be normally distributed. This part of the proposal is reflected in the text, and the non-conforming results need to be statistically analyzed using the non-parametric test of graphpad prism.

      We would like to thank the reviewer for pointing out this oversight. In the revised manuscript, for all applicable datasets, we tested whether the data was normally distributed using a Shapiro-Wilk normality test. For datasets that were normally distributed statistical significance was determined by a Student t test or ordinary one-way ANOVA with Tukey’s multiple comparisons test depending on the number of conditions being compared and the experimental setup. In contrast, for datasets that were not normally distributed statistical significance was determined using a Mann-Whitney, Kruskal-Wallis multiple comparisons tests, or Wilcoxon matched-pairs signed rank test depending on the experimental setup. P values below 0.05 were considered significant for all statistical tests.

      The authors used different methods to represent the level of significant difference. Therefore, it is suggested that the significance level should be expressed by letters. 

      As suggested by the reviewer, in the revised manuscript we have designated the significance level throughout all figures using letters (p, or q values).

      For RNA-seq, more information should be provided in the paper. For example, the correlation between sample biological replicates, the total number of differentially expressed genes, and randomly selected genes for qRT-PCR results verification.

      We would like to thank the reviewer for pointing out this oversight. In the revised manuscript we provided more information regarding the RNA-seq dataset, including a Principal Component Analysis (PCA) showing correlation between sample replicates (Revised Figure 4-figure supplement 1A), as well as a table indicating the number of upregulated and downregulated genes between relevant datasets (Revised Figure 4-figure supplement 1B).

      The results of the RNA-seq analysis indicated that ACK1 and BRK contribute to the macrophage anti-inflammatory gene expression program driven by apoptotic cells. MERTK-dependent anti-inflammatory program elicited by apoptotic cells on macrophages is best evidenced by the reduction of LPS-mediated production of inflammatory mediators such as TNF or IL1b 25-27,34,44. Therefore, to validate the RNA-seq results in a functional manner we tested the decrease of LPS-induced production of TNF and IL1b by apoptotic cells in isogenic WT, ACK1 deficient, and BRK deficient macrophages. Consistent with the RNA-seq data, the functional assays indicated that ACK1 and BRK kinase activities are required for the decrease of TNF and IL1b production induced by LPS in response to apoptotic cells (Revised Figure 4H,I).

      The raw data files for the RNA-seq analysis have been deposited in the NCBI Gene Expression Omnibus under accession number GEO: GSE118730.

      The authors did not have the formats for some of the citations correct. This should be fixed. 

      References were reformatted.

      (1) Eilertson, K. E., Booth, J. G. & Bustamante, C. D. SnIPRE: selection inference using a Poisson random effects model. PLoS Comput Biol 8, e1002806 (2012). https://doi.org/10.1371/journal.pcbi.1002806

      (2) Fadista, J., Oskolkov, N., Hansson, O. & Groop, L. LoFtool: a gene intolerance score based on loss-of-function variants in 60 706 individuals. Bioinformatics 33, 471-474 (2017). https://doi.org/10.1093/bioinformatics/btv602

      (3) Rackham, O. J., Shihab, H. A., Johnson, M. R. & Petretto, E. EvoTol: a protein-sequence based evolutionary intolerance framework for disease-gene prioritization. Nucleic Acids Res 43, e33 (2015). https://doi.org/10.1093/nar/gku1322

      (4) Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 9, e1003709 (2013). https://doi.org/10.1371/journal.pgen.1003709

      (5) Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434-443 (2020). https://doi.org/10.1038/s41586-020-2308-7

      (6) Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-291 (2016). https://doi.org/10.1038/nature19057

      (7) Rapaport, F. et al. Negative selection on human genes underlying inborn errors depends on disease outcome and both the mode and mechanism of inheritance. Proc Natl Acad Sci U S A 118 (2021). https://doi.org/10.1073/pnas.2001248118

      (8) Mahajan, N. P., Whang, Y. E., Mohler, J. L. & Earp, H. S. Activated tyrosine kinase Ack1 promotes prostate tumorigenesis: role of Ack1 in polyubiquitination of tumor suppressor Wwox. Cancer Res 65, 10514-10523 (2005). https://doi.org/10.1158/0008-5472.CAN-05-1127

      (9) Wu, Y., Singh, S., Georgescu, M. M. & Birge, R. B. A role for Mer tyrosine kinase in alphavbeta5 integrin-mediated phagocytosis of apoptotic cells. J Cell Sci 118, 539-553 (2005). https://doi.org/10.1242/jcs.01632

      (10) Scott, R. S. et al. Phagocytosis and clearance of apoptotic cells is mediated by MER. Nature 411, 207-211 (2001). https://doi.org/10.1038/35075603

      (11) Henson, P. M. & Bratton, D. L. Antiinflammatory effects of apoptotic cells. J Clin Invest 123, 2773-2774 (2013). https://doi.org/10.1172/JCI69344

      (12) Henson, P. M. Cell Removal: Efferocytosis. Annu Rev Cell Dev Biol 33, 127-144 (2017). https://doi.org/10.1146/annurev-cellbio-111315-125315

      (13) deCathelineau, A. M. & Henson, P. M. The final step in programmed cell death: phagocytes carry apoptotic cells to the grave. Essays Biochem 39, 105-117 (2003). https://doi.org/10.1042/bse0390105

      (14) Nagata, S. Apoptosis and Clearance of Apoptotic Cells. Annu Rev Immunol 36, 489-517 (2018). https://doi.org/10.1146/annurev-immunol-042617-053010

      (15) Cohen, P. L. et al. Delayed apoptotic cell clearance and lupus-like autoimmunity in mice lacking the c-mer membrane tyrosine kinase. J Exp Med 196, 135-140 (2002). https://doi.org/10.1084/jem.20012094

      (16) Hanayama, R. et al. Autoimmune disease and impaired uptake of apoptotic cells in MFG-E8-deficient mice. Science 304, 1147-1150 (2004). https://doi.org/10.1126/science.1094359

      (17) Miyanishi, M., Segawa, K. & Nagata, S. Synergistic effect of Tim4 and MFG-E8 null mutations on the development of autoimmunity. Int Immunol 24, 551-559 (2012). https://doi.org/10.1093/intimm/dxs064

      (18) Colonna, L., Parry, G. C., Panicker, S. & Elkon, K. B. Uncoupling complement C1s activation from C1q binding in apoptotic cell phagocytosis and immunosuppressive capacity. Clin Immunol 163, 84-90 (2016). https://doi.org/10.1016/j.clim.2015.12.017

      (19) Nagata, S., Hanayama, R. & Kawane, K. Autoimmunity and the clearance of dead cells. Cell 140, 619-630 (2010). https://doi.org/10.1016/j.cell.2010.02.014

      (20) Kimani, S. G. et al. Contribution of Defective PS Recognition and Efferocytosis to Chronic Inflammation and Autoimmunity. Front Immunol 5, 566 (2014). https://doi.org/10.3389/fimmu.2014.00566

      (21) Hanayama, R., Tanaka, M., Miwa, K., Shinohara, A., Iwamatsu, A. & Nagata, S. Identification of a factor that links apoptotic cells to phagocytes. Nature 417, 182-187 (2002). https://doi.org/10.1038/417182a

      (22) Kawano, M. & Nagata, S. Lupus-like autoimmune disease caused by a lack of Xkr8, a caspase-dependent phospholipid scramblase. Proc Natl Acad Sci U S A 115, 2132-2137 (2018). https://doi.org/10.1073/pnas.1720732115

      (23) Watanabe-Fukunaga, R., Brannan, C. I., Copeland, N. G., Jenkins, N. A. & Nagata, S. Lymphoproliferation disorder in mice explained by defects in Fas antigen that mediates apoptosis. Nature 356, 314-317 (1992). https://doi.org/10.1038/356314a0

      (24) Singer, G. G., Carrera, A. C., Marshak-Rothstein, A., Martinez, C. & Abbas, A. K. Apoptosis, Fas and systemic autoimmunity: the MRL-lpr/lpr model. Current opinion in immunology 6, 913-920 (1994).

      (25) Cvetanovic, M. & Ucker, D. S. Innate immune discrimination of apoptotic cells: repression of proinflammatory macrophage transcription is coupled directly to specific recognition. J Immunol 172, 880-889 (2004). https://doi.org/10.4049/jimmunol.172.2.880

      (26) Fadok, V. A., Bratton, D. L., Konowal, A., Freed, P. W., Westcott, J. Y. & Henson, P. M. Macrophages that have ingested apoptotic cells in vitro inhibit proinflammatory cytokine production through autocrine/paracrine mechanisms involving TGF-beta, PGE2, and PAF. J Clin Invest 101, 890-898 (1998). https://doi.org/10.1172/JCI1112

      (27) Voll, R. E., Herrmann, M., Roth, E. A., Stach, C., Kalden, J. R. & Girkontaite, I. Immunosuppressive effects of apoptotic cells. Nature 390, 350-351 (1997). https://doi.org/10.1038/37022

      (28) Herrmann, M., Voll, R. E., Zoller, O. M., Hagenhofer, M., Ponner, B. B. & Kalden, J. R. Impaired phagocytosis of apoptotic cell material by monocyte-derived macrophages from patients with systemic lupus erythematosus. Arthritis Rheum 41, 1241-1250 (1998). https://doi.org/10.1002/1529-0131(199807)41:7<1241::AID-ART15>3.0.CO;2-H

      (29) Baumann, I. et al. Impaired uptake of apoptotic cells into tingible body macrophages in germinal centers of patients with systemic lupus erythematosus. Arthritis Rheum 46, 191-201 (2002). [https://doi.org/10.1002/1529-0131(200201)46:1](https://doi.org/10.1002/1529-0131(200201)46:1<191::AID-ART10027>3.0.CO;2-K

      (30) Schrijvers, D. M., De Meyer, G. R. Y., Kockx, M. M., Herman, A. G. & Martinet, W. Phagocytosis of apoptotic cells by macrophages is impaired in atherosclerosis. Arterioscl Throm Vas 25, 1256-1261 (2005). https://doi.org/10.1161/01.ATV.0000166517.18801.a7

      (31) Morioka, S., Maueroder, C. & Ravichandran, K. S. Living on the Edge: Efferocytosis at the Interface of Homeostasis and Pathology. Immunity 50, 1149-1162 (2019). https://doi.org/10.1016/j.immuni.2019.04.018

      (32) Seitz, H. M., Camenisch, T. D., Lemke, G., Earp, H. S. & Matsushima, G. K. Macrophages and dendritic cells use different Axl/Mertk/Tyro3 receptors in clearance of apoptotic cells. J Immunol 178, 5635-5642 (2007). https://doi.org/10.4049/jimmunol.178.9.5635

      (33) Mao, Y. & Finnemann, S. C. Regulation of phagocytosis by Rho GTPases. Small GTPases 6, 89-99 (2015). https://doi.org/10.4161/21541248.2014.989785

      (34) Sen, P. et al. Apoptotic cells induce Mer tyrosine kinase-dependent blockade of NF-kappaB activation in dendritic cells. Blood 109, 653-660 (2007). https://doi.org/10.1182/blood-2006-04-017368

      (35) Vergadi, E., Ieronymaki, E., Lyroni, K., Vaporidi, K. & Tsatsanis, C. Akt Signaling Pathway in Macrophage Activation and M1/M2 Polarization. J Immunol 198, 1006-1014 (2017). https://doi.org/10.4049/jimmunol.1601515

      (36) Byles, V. et al. The TSC-mTOR pathway regulates macrophage polarization. Nat Commun 4, 2834 (2013). https://doi.org/10.1038/ncomms3834

      (37) Liao, X. et al. Kruppel-like factor 4 regulates macrophage polarization. J Clin Invest 121, 2736-2749 (2011). https://doi.org/10.1172/JCI45444

      (38) Roberts, A. W., Lee, B. L., Deguine, J., John, S., Shlomchik, M. J. & Barton, G. M. Tissue-Resident Macrophages Are Locally Programmed for Silent Clearance of Apoptotic Cells. Immunity 47, 913-927 e916 (2017). https://doi.org/10.1016/j.immuni.2017.10.006

      (39) Matsukawa, A. et al. Stat3 in resident macrophages as a repressor protein of inflammatory response. J Immunol 175, 3354-3359 (2005).

      (40) Sica, A. & Mantovani, A. Macrophage plasticity and polarization: in vivo veritas. J Clin Invest 122, 787-795 (2012). https://doi.org/10.1172/JCI59643

      (41) Yi, Z., Li, L., Matsushima, G. K., Earp, H. S., Wang, B. & Tisch, R. A novel role for c-Src and STAT3 in apoptotic cell-mediated MerTK-dependent immunoregulation of dendritic cells. Blood 114, 3191-3198 (2009). https://doi.org/10.1182/blood-2009-03-207522

      (42) Rothlin, C. V., Carrera-Silva, E. A., Bosurgi, L. & Ghosh, S. TAM receptor signaling in immune homeostasis. Annu Rev Immunol 33, 355-391 (2015). https://doi.org/10.1146/annurev-immunol-032414-112103

      (43) Seo, H. et al. Dual-specificity phosphatase 5 acts as an anti-inflammatory regulator by inhibiting the ERK and NF-kappaB signaling pathways. Sci Rep 7, 17348 (2017). https://doi.org/10.1038/s41598-017-17591-9

      (44) Camenisch, T. D., Koller, B. H., Earp, H. S. & Matsushima, G. K. A novel receptor tyrosine kinase, Mer, inhibits TNF-alpha production and lipopolysaccharide-induced endotoxic shock. J Immunol 162, 3498-3503 (1999).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Response to reviewer’s comments

      Reviewer #2 (Public Review):

      Summary: 

      The manuscript focuses on comparison of two PLP-dependent enzyme classes that perform amino acyl decarboxylations. The goal of the work is to understand the substrate specificity and factors that influence catalytic rate in an enzyme linked to theanine production in tea plants.

      Strengths: 

      The work includes x-ray crystal structures of modest resolution of the enzymes of interest. These structures provide the basis for design of mutagenesis experiments to test hypotheses about substrate specificity and the factors that control catalytic rate. These ideas are tested via mutagenesis and activity assays, in some cases both in vitro and in plants. 

      Weaknesses:

      Although improved in a revision, the manuscript could be more clear in explaining the contents of the x-ray structures and how the complexes studied relate to the reactant and product complexes. The manuscript could also be more concise, with a discussion section that is largely redundant with the results and lacking in providing scholarly context from the literature to help the reader understand how the current findings fit in with work to characterize other PLP-dependent enzymes or protein engineering efforts. Some of the figures lack sufficient clarity and description. Some of the claims about the health benefits of tea are not well supported by literature citations.

      Thank you for your insightful comments on our manuscript and your recognition of the strengths of our study. We understand your concerns about the weaknesses mentioned, and we have addressed them appropriately in the revised manuscript. We acknowledge that the discussion section needs to be improved for conciseness and context. We have revised this part by removing the redundant content. We also acknowledge your comments concerning the clarity and description of some figures. We have revisited these figures and revised them, ensuring they are clear and adequately described. Lastly, concerning the claims about the health benefits of tea, we understand your concern about the lack of supporting citations. We ensure to back such claims with valid literature or, if necessary, omit these statements.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 21: Alanine Decarboxylase should not be capitalized.

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (2) Line 31: Grammatical error. Also not clear what "evolution analysis" means here. Revise to "Structural comparisons led us to..."

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (3) Line 34: Revise to "Combining a double mutant of CsAlaDC"

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (4) Line 35: Change word order to "increased theanine production 672%"

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (5) Line 37: meaning unclear. Revise to "provides a route to more efficient biosynthesis of theanine."

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (6) Line 44: I'm not sure that the "health effects" of tea have been proven in placebo controlled studies. And the references provided (2-4 and 5) do not describe original research articles supporting these claims. I would suggest removing these statements from the introduction and at later points in the manuscript.

      Thank you for your thoughtful feedback and suggestions. Based on your suggestion, we have removed these statements: "The popularity of tea is determined by its favorable flavor and numerous health benefits (2-4). The flavor and health-beneficial effects of tea are conferred by the abundant secondary metabolites, including catechins, caffeine, theanine, volatiles, etc (5). " As for the subsequent statement: " It has also many health-promoting functions, including neuroprotective effects, enhancement of immune functions, and potential anti-obesity capabilities, among others. " the referenced literature cited can substantiate this conclusion.

      (7) Line 58: insert "the" between provided and basis

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (8) Line 100: Not clear what this phrase means, "As expected, CsSerDC was closer to AtSerDC" Please clarify - closer to what?

      We apologize for any confusion caused by the unclear phrasing. When referring to "CsSerDC was closer to AtSerDC," we intended to convey that CsSerDC exhibits a higher degree of sequence homology with AtSerDC than it does with the other enzymes evaluated in our investigation. However, a 1.29% difference between 86.21% and 84.92% in amino acid similarity is not statistically significant (Figure 1B and Supplementary table 1 in the original manuscript), we have deleted the relevant descriptions in the revised manuscript.

      (9) Line 112: "were constructed into" makes no sense. It would be better to say the genes for the proteins of interest were inserted into the overexpression plasmid.

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (10) Line 115: missing the word "the" between generated and recombinant

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (11) Line 121: catalyze not catalyzed

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (12) Lines 129 and 130: The reported Km values are really large - in the mM range. Do these values make sense in terms of the available concentrations of the substrates inside the cell?

      The content of alanine in tea plant roots ranges from 0.28 to 4.18 mg/g DW (Yu et al., 2021; Cheng et al., 2017). Correspondingly, the physiological concentration of alanine is 3.14 mM to 46.92 mM, in tea plant roots. The content of serine in plants ranges from 0.014 to 17.6 mg/g DW (Kumar et al., 2017). Correspondingly, the physiological concentration of serine is 0.13 mM to 167.48 mM in plants. Therefore, in this study, the Km values are within the range of available substrate concentrations inside the cell.

      Yu, Y. et al. (2021) Glutamine synthetases play a vital role in high accumulation of theanine in tender shoots of albino tea germplasm "Huabai 1". J. Agric. Food Chem. 69 (46),13904-13915.

      Cheng, S. et al. (2017) Studies on the biochemical formation pathway of the amino acid L-theanine in tea (Camellia sinensis) and other plants. J. Agric. Food Chem. 65 (33), 7210-7216.

      Kumar, V. et al. (2017) Differential distribution of amino acids in plants. Amino Acids. 49(5), 821-869.

      (13) Line 211: it is unclear what the phrase "as opposed to wild-type" means. Please clarify.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We intend to communicate that the wild-type CsAlaDC and AtSerDC demonstrate decarboxylase activity, while the mutated proteins have experienced a loss of decarboxylation activity. We have already modified this concern in the revised version of the manuscript.

      (14) Line 222: residues not residue

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (15) Line 227 and Figure 4B: It is not clear what the different sequence logos mean in this part of the figure. The caption is too brief and not helpful. And the sentences describing this figure panel are also not sufficiently clear.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have provided a more detailed explanation of this section in the revised manuscript and added additional annotations in the figure caption to provide further clarity.

      (16) Lines 233 and 234: "in the substrate specificity" is awkwardly worded. I would revise to "in selective binding of the appropriate substrate."

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have meticulously revised the description of this section.

      (17) Line 243: a word is missing in this sentence - but I can't figure out the intended meaning or what the missing word is. Rephrase to improve clarity.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have revised this sentence to: " These findings indicate the essential role of Phe106 in the selective binding of alanine for CsAlaDC. "

      (18) Line 255: The "expression system...was carried out" is not correct. I would say the expression system was used - but you probably also want to rearrange the sentences to more directly say what it was used for. Later, the word "the" is also missing.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have revised this sentence to: "To further verify that Phe106 of CsAlaDC and Tyr111 of AtSerDC were key amino acid residues determining its substrate recognition in planta, we employed the Nicotiana benthamiana transient expression system. "

      (19) Line 273: use "understand" instead of "elucidate" and instead of "we proposed a prediction test:" say "we designed a test of the prediction that..."

      Thank you very much for your careful reading of the manuscript. We have revised this sentence to: “In light of this observation, we postulated a hypothesis:”

      (20) Line 301: I don't think "effectuate" is a word. Replace with something else.

      Thank you very much for your careful reading of the manuscript. We have revised the sentence as: " The biosynthetic pathway of theanine in tea plants comprises two consecutive enzymatic steps: alanine decarboxylase facilitates the decarboxylation of alanine to generate EA, while theanine synthetase catalyzes the condensation reaction between EA and Glu to synthesize theanine. "

      (21) Line 307: replace "activity" with "ability"

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (22) Line 322: I didn't find the discussion very useful. Much of it is simply a recap of the results - which is not necessary. The structural comparisons are overly descriptive without providing appropriate rationale or topic sentence structure so that the reader understands why certain details are emphasized. I think the manuscript would be much stronger if this section were not included or integreted more concisely into the results section where appropriate.

      Thank you for your constructive comments. We understand your concerns about the discussion section of our manuscript. We acknowledge that the discussion section has redundancies with the result. In response to this, we have revised this section to eliminate unnecessary repetition of the results.

      (23) Line 369: "an amino acid devoid of the hydroxyl moiety present in Lys" - what does this mean? Lys does not have a hydroxyl functional group. Please correct so that the sentence makes sense.

      Thank you very much for your careful reading of the manuscript. This sentence states that the amino acid occupying the corresponding position in CsAlaDC is Phe, which lacks one hydroxyl functional group as compared to Lys. We have made modifications to the sentence as follows: "In contrast, the equivalent position in CsAlaDC is occupied by Phe, an amino acid lacking the hydroxyl group. This substitution enhances the hydrophobic nature of the substrate-binding pocket. "

      (24) Line 370: "This structural nuance portends a predisposition for CsAlaDC to select the comparatively hydrophobic amino acid alanine as its suitable substrate." This sentence also makes no sense - please revise to use simpler language so the meaning is more clear.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have revised the sentence as follows: " Consequently, CsAlaDC demonstrates a unique predilection, selectively binding Ala (an amino acid with comparatively hydrophobic properties) as its preferred substrate."

      (25) Lines 376-384: This section makes several references to "catalytic rings." I have no idea what this term means? If the authors mean a loop structure in the enzyme - please use the term "loop"

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.

      (26) Line 396-397: The authors reference data that is not shown in the manuscript. Either show the data in the results section or do not mention.

      Thank you for your insightful comment regarding the unshown data referenced in the manuscript. We have included Supplementary figure 9 in the revised manuscript to display this data.

      (27) Line 445-446: what is "mutation technology" - if the authors mean site-directed mutagenesis - please use the simpler and more recognizable terminology.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have revised the sentence as follows: "Based on the findings of this study, site-directed mutagenesis can be employed to modify enzymes involved in theanine synthesis. This modification enhances the capacity of bacteria, yeast, model plants, and other organisms to synthesize theanine, thereby facilitating its application in industrial theanine production."

      Reviewer #3 (Public Review):

      In the manuscript titled "Structure and Evolution of Alanine/Serine Decarboxylases and the Engineering of Theanine Production," Wang et al. solved and compared the crystal structures of Alanine Decarboxylase (AlaDC) from Camellia sinensis and Serine Decarboxylase (SerDC) from Arabidopsis thaliana. Based on this structural information, the authors conducted both in vitro and in vivo functional studies to compare enzyme activities using site-directed mutagenesis and subsequent evolutionary analyses. This research has the potential to enhance our understanding of amino acid decarboxylase evolution and the biosynthetic pathway of the plant specialized metabolite theanine, as well as to further its potential applications in the tea industry.

      Thank you very much for taking the time to review this manuscript. We appreciate all your insightful comments.

      Reviewer #3 (Recommendations For The Authors):

      The additional material added by the authors addresses some of the previously raised questions and enhances the manuscript's quality. However, certain critical issues we pointed out earlier remain unaddressed. Some of the new data also raises new questions. To provide readers with more comprehensive data, the authors should include additional quantitative data and convert the data presented in the reviewer's comments into supplemental figure format.

      Thank you for acknowledging the improvements in the revised manuscript and providing further valuable feedback. We understand your concern about the critical issues that have not been fully addressed and the new questions raised by some of the newly added data. We have strived to address these issues with additional analysis and clarification in our subsequent revision. Regarding your suggestion for more quantitative data and converting the data mentioned in the reviewer's comments into a supplemental figure format, we agree that this would provide a more comprehensive view of the results. We have reformatted the relevant data into supplemental figures to enhance the clarity and accessibility of information. We are grateful for the time and effort you have dedicated to improving our manuscript.

      * Page 5 & Figure 1B

      "As expected, CsSerDC was most closed to AtSerDC, which implies that they shared similar functions. However, CsAlaDC is relatively distant from CsSerDC."

      : In Figure 1B, CsSerDC and AtSerDC are in different clades, and this figure does not show that the two enzymes are closest. To provide another quantitative comparison, please provide a matrix table showing amino acid sequence similarities as a supplemental table. 

      Comment: I don't believe that a 1.29% difference between 86.21% and 84.92% in amino acid similarity is statistically significant. Although the authors have rephrased the original sentence, it's improbable that this small 1.29% difference can explain the observed distinction.

      Many thanks. We have carefully considered your comments. Indeed, the 1.29% difference in amino acid similarity cannot reflect the functional difference between the AlaDC and SerDC proteins. We have deleted the relevant descriptions in the revised manuscript.

      * Page 6, Figure 2, Page 23 (Methods)

      "The supernatants were purified with a Ni-Agarose resin column followed by size-exclusion chromatography."

      : What kind of SEC column did the authors use? Can the authors provide the SEC elution profile comparison results and size standard curve?

      Comment: The authors should include the SEC elution profiles as a supplemental figure or incorporate them as a panel in Figure 2. Furthermore, they should provide a description of the oligomeric state of each protein in this experiment. Additionally, there is a significant difference between CsSerDC (65.38 mL) and CsAlaDC (74.37 mL) elution volumes. Can this difference be explained structurally? In comparison to the standard curve of molecular weight provided by the authors, it appears that these proteins are at least homo-tetramers, which contradicts the description in the text. This should be re-evaluated and clarified.  

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have included the SEC elution profile in Supplemental figure 1A and added descriptions of the oligomeric states of proteins in the revised manuscript. CsSerDC was eluted at 65.38 mL, corresponding to a molecular weight of 292 kDa, which is five times the monomeric protein (54.7 kDa). However, due to the absence of CsSerDC crystal structure, it remains uncertain whether the protein forms a pentamer. AtSerDC was eluted at 72.25 mL, with a corresponding molecular weight of 155 kDa, which is 3.3 times the monomer (47.3 kDa). CsAlaDC was eluted at 74.37 mL, with a corresponding molecular weight of 127 kDa, which is 2.7 times the monomer (47.3 kDa). The elution profiles suggest that AtSerDC and CsAlaDC potentially exist in homotrimeric form. This observation stands in contradiction to our subsequent findings where the protein manifests in a dimeric structure. A plausible explanation could be the non-ideal spherical shape of the protein. Under such circumstances, the hydrodynamic radius of the protein could supersede its actual size, potentially leading to an overestimation of the molecular weight on the size-exclusion chromatography [ref].

      References:

      Burgess, R. R. (2018) A brief practical review of size exclusion chromatography: Rules of thumb, limitations, and troubleshooting. Protein Expression and Purification. 150, 81-85.

      Erdner J. M., et al. (2006) Size-Exclusion Chromatography Using Deuterated Mobile Phases. Journal of Chromatography A. 1129(1):41–46.

      * Page 6 & Page 24 (Methods)

      "The 100 μL reaction mixture, containing 20 mM substrate (Ala or Ser), 100 mM potassium phosphate, 0.1 mM PLP, and 0.025 mM purified enzyme, was prepared and incubated at standard conditions (45 {degree sign}C and pH 8.0 for CsAlaDC, 40 {degree sign}C and pH 8.0 for AtSerDC for 30 min)."

      (1) The enzymatic activities of CsAldDC and AtSerDC were measured at two different temperatures (45 and 40 {degree sign}C), but their activities were directly compared. Is there a reason for experimenting at different temperatures?

      (2) Enzyme activities were measured at temperatures above 40{degree sign}C, which is not a physiologically relevant temperature and may affect the stability or activity of the proteins. At the very least, the authors should provide temperature-dependent protein stability data (e.g., CD spectra analysis) or, if possible, temperature-dependent enzyme activities, to show that their experimental conditions are suitable for studying the activities of these enzymes.

      Comment: I appreciate the authors for including temperature-dependent enzyme activity data in their study. However, it remains puzzling that plant enzymes were tested at a physiologically irrelevant temperature of 40 and 45 degrees Celsius. Additionally, it may not be appropriate to directly compare enzyme activity measurements at different temperatures. Furthermore, the data at 45 degrees in panel A appears to be an outlier, which contrasts with the overall trend observed in the graph.

      We appreciate your point regarding the testing temperatures for plant enzymes. We fully appreciate the importance of conducting experiments under physiologically relevant conditions. But the intent behind operating at these elevated temperatures was to assess the thermal stability of the enzymes, which can be a valuable characteristic in certain applications, such as industrial production processes, and does not necessarily reflect their physiological conditions. Our findings indicate that CsAlaDC exhibits its peak activity at 45 °C. This result aligns with previously reported data in the literature [Bai, P. et al. (2021) figure 4e], thus bolstering our confidence in the reliability of our experimental outcomes.

      Author response image 1.

      Relative activity of CsAlaDC at different temperatures.

      * Pages 6-7 & Table 1

      (1) Use the correct notation for Km and Vmax. Also, the authors show kinetic parameters and use multiple units (e.g., mmol/L or mM for Km).

      (2) When comparing the catalytic efficiency of enzymes, kcat/Km (or Vmax/Km) is generally used. The authors present a comparison of catalytic activity from results to conclusion. A clarification of what results are being compared is needed.

      Comment: The authors are still comparing catalytic efficiency solely based on the Vmax values. As previously suggested, it would be advisable to calculate kcat/Km and employ it for comparing catalytic efficiencies. Furthermore, based on the data provided by the authors, I conducted a rough calculation of these catalytic efficiencies and did not observe a significant difference, which contrasts with the authors' statement, "These findings indicated that the catalytic efficiency of CsAlaDC is considerably lower than that of both CsSerDC and AtSerDC." This discrepancy requires clarification.  

      We want to express our sincere appreciation for your meticulous review and constructive suggestions. We understand the importance of accurately comparing catalytic efficiencies using Kcat/Km values, rather than solely relying on Vmax values. Following your suggestion, we recalculated Kcat/Km to reanalyze our results. The computed Kcat/Km for CsSerDC and AtSerDC are 152.7 s-1 M-1 and 184.6 s-1 M-1, respectively. For CsAlaDC, the calculated Kcat/Km is 55.7 s-1 M-1. Therefore, the catalytic efficiency of CsSerDC and AtSerDC is approximately three times that of CsAlaDC.  What we intended to convey was that the Vmax of CsAlaDC is lower than that of CsSerDC and AtSerDC.  Our description in the manuscript was not accurate, and we have addressed this in the revised version.

      * Pages 9 & 10

      "This result suggested this Tyr is required for the catalytic activity of CsAlaDC and AtSerDC."

      : The author's results are interesting, but it is recommended to perform the experiments in a specific order. First, experiments should determine whether mutagenesis affects the protein's stability (e.g., CD, as discussed earlier), and second, whether mutagenesis affects ligand binding (e.g., ITC, SPR, etc.), before describing how site-directed mutagenesis alters enzyme activity. In particular, the authors' hypothesis would be much more convincing if they could show that the ligand binding affinity is similar between WT and mutants.

      Comments: While it is appreciated that you have included CD and UV-vis absorption spectra data, it would be more beneficial to provide quantitative data to address the previously proposed binding affinity. I also recommend presenting the data mentioned in the reviewer's comments as a supplementary figure for better clarity and reference.  

      Thank you for your valuable feedback and suggestions. I agree that providing quantitative data would lend more support to our findings and better address the proposed binding affinity.

      It is generally acknowledged that proteins complexed with PLP exhibit a yellow hue, and the ligand PLP forms a Schiff base structure with the ε-amino group of a lysine residue in the protein, with maximum absorbance around 420 nm. However, during our protein purification process, we observed that the purified protein retained its yellow coloration, even when PLP wasn't introduced into the purification buffer. Subsequent absorbance measurements revealed that the protein exhibited absorbance within the aforementioned wavelength (420 nm) (the experimental results are shown in the following figures), implying an inherent presence of the PLP ligand within the protein. This could have resulted from binding with PLP during the protein's expression in E. coli. Consequently, due to this inseparability between the protein and the ligand, obtaining quantitative data through experimental means becomes unfeasible.

      Author response image 2.

      (A) Absorption Spectra of CsAlaDC (WT) and CsAlaDC (Y336F). (B) Absorption Spectra of AtSerDC (WT) and AtSerDC (Y341F).

      Regarding your suggestion about presenting the data mentioned in the reviewer's comments as a supplementary figure, we agree that it is an excellent idea. We have prepared supplementary figure 7 and supplementary figure 8 accordingly, ensuring that they present the required data.

      * Page 10

      "The results showed that 5 mM L-DTT reduced the relative activity of CsAlaDC and AtSerDC to 22.0% and 35.2%, respectively"

      : The authors primarily use relative activity to compare WT and mutants. Can the authors specify the exact experiments, units, and experimental conditions? Is it Vmax or catalytic efficiency? If so, under what specific experimental conditions?

      Response: "However, due to the unknown mechanism of DTT inhibition on protein activity, we have removed this part of the content in the revised manuscript."

      Comment: I believe this requires a more comprehensive explanation rather than simply removing it from the text.  

      Although we have observed that DTT is capable of inhibiting enzyme activity, at present, we are unable to offer a comprehensive explanation for the inhibitory effect of DTT on enzyme activity in terms of its structural and catalytic mechanisms. Further research is required to elucidate the mechanism of action of DTT. It is worth noting, however, that our study does not emphasize investigating the specific inhibitory mechanisms of DTT on enzyme activity. Furthermore, the existing findings do not provide an adequate explanation for the observed phenomenon, leading us to exclude this particular aspect from the content.

      * Pages 10-12

      : The identification of 'Phe106 in CsAlaDC' and 'Tyr111 in AtSerDC,' along with the subsequent mutagenesis and enzymatic activity assays, is intriguing. However, the current manuscript lacks an explanation and discussion of the underlying reasons for these results. As previously mentioned, it would be helpful to gain insights and analysis from WT-ligand and mutant-ligand binding studies (e.g., ITC, SPR, etc.). Furthermore, the authors' analysis would be more convincing with accompanying structural analysis, such as steric hindrance analysis.

      Comment: While it is appreciated that you have included UV-vis absorption spectra data, it would be more beneficial to provide quantitative data to address the previously proposed binding affinity. I also recommend presenting the data mentioned in the reviewer's comments as a supplementary figure for better clarity and reference.  

      Response: Thank you for your valuable feedback and suggestions. Given that the protein forms a complex with PLP during its expression in E. coli and cannot be dissociated from it, obtaining quantitative data via experimental protocols is rendered impracticable.

      Author response image 3.

      (A) Absorption Spectra of CsAlaDC (WT) and CsAlaDC (F106Y). (B) Absorption Spectra of AtSerDC (WT) and AtSerDC (Y111F).

      Mutant proteins and wild-type proteins exhibited absorption bands at 420 nm, suggesting the formation of a Schiff base between PLP and the active-site lysine residue.

      Regarding your suggestion about presenting the data mentioned in the reviewer's comments as a supplementary figure, we have prepared supplementary figure 7 and supplementary figure 8 accordingly, ensuring that they present the required data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:  

      Overall, the conclusions appear appropriately supported by the data, and the data appear of high quality.

      Strengths:

      The particular strengths of the paper include an impressive combination of genomic and imaging-based approaches and insightful genetically engineered cell systems. The manuscript reports interesting and potentially important findings. The text is generally very well written, the ideas are clearly explained, and the reasoning is easy to follow.

      Weaknesses:

      The main weakness seems to be that the heat and ethanol shock approaches likely elicit pleiotropic effects, and therefore it is a challenge to test the causal relationship between various observations. Nevertheless, even as indirect effects might contribute to some of the authors' observations, the results are definitively worth reporting.  

      We agree that these two proteotoxic stresses can impact cell physiology in multiple ways and discuss this on lines 132-143 and 500-519. Moreover, in this revision we have more rigorously quantified the extent of proteotoxic stress elicited by the 39°C heat shock and 8.5% ethanol stress (Figure 1E; see response 1 to Reviewer 2). We have additionally added new Figure 2 that reveals an important difference in the way Hsf1 and its negative regulator, the Hsp70 co-chaperone Sis1, respond to HS and ES. This difference is evident at two different intensities for each stress as described in more detail below (see response 1 to Reviewer 2).

      Presentation of some of the data could be improved.

      We agree and have made improvements/data additions to multiple figures: Figure 1E; Figures 3A, B; Figures 4A, B; Figure 7 (data drawn from original Fig. 6 and Fig. 6 – fig. suppl. 1 and reorganized); Fig. 8B; Figure 9; Figure 10. Corresponding enhancements to the supplemental figures have been made as well. 

      Reviewer #2:  

      (1) The central finding of the study highlights the different dynamics of Hsf1, Pol II, and gene organization in response to heat shock versus ethanol stress. However, one important limitation to consider is that the two chosen conditions may not be directly comparable. For a balanced assessment, the authors should ideally expose yeast to various ethanol concentrations and different heat shock temperatures, ensuring the observed differences stem from the nature of the stressor rather than suboptimal stress intensity. At the very least, an additional single ethanol concentration point on each side of 8.5% should be investigated to ensure that 8.5% is near the optimum. In fact, comparing the number of Hsp104 foci in the two conditions in Fig. 1E and F suggests that the yeast is likely experiencing different intensities of stress for the chosen heat shock condition and ethanol concentration used in this study.

      We thank the reviewer for this important suggestion. In this revision, we have included an enhanced analysis of the yeast cellular response to each of these stresses. As illustrated in revised Figure 1, the two stresses used throughout this study – 39°C heat shock and 8.5% ethanol stress – both elicit a proteotoxic response, as assayed by the de novo formation of Hsp104 clusters. While 10 min exposure to 8.5% ethanol results in the formation of multiple discrete (spherical) foci, a 10 min exposure to the elevated temperature leads the appearance of multiple, largely diffuse Hsp104 clusters, some of which are spherical (new Fig. 1D). The difference in morphology notwithstanding, we have attempted to quantify these clusters using Imaris v. 10.0.1 image analysis software; the results are depicted in Fig. 1E. Such quantification suggests that 8.5% ethanol elicits a more intense stress than exposure to 39°C. A caveat is that it is unclear whether diffuse Hsp104 clusters are comparable to compact Hsp104 foci (see response 3 below).

      Beyond the apparent difference in intensity, a new analysis presented in new Figure 2 reveals that heat shock, elicited by temperature upshift to either 39°C or 42°C, induces relocalization of the J-protein Sis1 – a key negative regulator of Hsf1 – from the nucleoplasm to the nucleolar periphery. Sis1’s perinucleolar ring localization agrees with previous findings of 39°C heat-shocked cells (Feder et al., 2021). Ethanol stress, whether 5% or 8.5%, initially causes Sis1 to relocalize diffusely throughout the nucleus and cytosol. At 10 min, Sis1 localizes to the periphery of the nucleus, thereby providing a marked contrast to what is observed in response to heat shock. These new results are described on lines 174-191.

      Taking these two observations together, we asked whether a less severe ethanol stress (5%) would induce Hsf1 puncta. It does, and as rapidly as 8.5% ethanol (data are presented in revised Figure 8-figure supplement 1). Interestingly, in the presence of 5% ethanol, Hsf1 puncta begin to dissolve at 30 min. This strongly contrasts with the case when cells are exposed to 8.5% ethanol (Figure 8; Figure 8-figure supplement 1). As we state in this revision (lines 414-424), the sustained presence of condensates that we originally observed is likely the consequence of the intensity of the proteotoxic stress elicited by exposure to 8.5% ethanol; analogous responses to these two stress conditions have been observed before (lines 495-501). 

      (2) A second significant concern is the use of the term "Hsf1 condensate". Chowdhary et al.'s 2022 Molecular Cell study highlighted an inhomogeneous distribution and rapid dynamics of Hsf1 clustering upon heat shock, with sensitivity to 1,6-hexandiol, which is interpreted as evidence for condensation by LLPS. However this interpretation has been criticized severely by McSwiggen et al. Genes Dev 2019 and Mussacchio EMBO J 2022. It is important to mention that 1,6-hexandiol is known to affect chromatin organization (Itoh et al. Life Science Alliance 2021). Describing such clusters as 'condensates' without further experimental evidence is premature.  

      While we appreciate and largely agree with the point made by this reviewer, we prefer to maintain the term “condensate”. Banani et al (2017) originally defined “biomolecular condensate” to mean selforganized membrane-free compartments that concentrate specific biomolecules. It was never meant to imply LLPS although its widespread use in the literature has led to that implication. We clarify our use of this term on lines 99-104.   

      (3) Figure 1: Why does ethanol stress at 0 min display a larger number of Hsp104 foci per cell than heat shock at the same time? How are foci defined by the authors? In Fig. 1D, there are many smaller puncta. A comparative assessment of the number and size of foci for heat shock and ethanol stress would be beneficial.

      We thank the reviewer for raising this point and have addressed it as follows.  First, we repeated the assay with a different strain (DPY1561) and increased the number of cells assayed from 40 to 200. This larger sample size created the same T=0 baseline for both stresses (Figure 1E). Second, we define Hsp104 foci as diffraction-limited structures with a diameter of ~0.4 µm (lines 747-749).  Third, employing Imaris v. 10.0.1, we quantified foci size (= volume) and a summary graph has been added to Figure 1E that also displays the number of foci per cell. In the legend to this figure, we point out that to conduct this analysis we assumed that the diffuse Hsp104 clusters seen in HS cells are comparable to the compact Hsp104 foci in ES cells (lines 1169-1171). 

      (4) Figure 2: Selecting a housekeeping gene with consistent expression levels is crucial for meaningful qPCR analysis. Do SCR1 mRNA levels fluctuate during heat shock or ethanol stress?  

      We thank the reviewer for this question. In revised Figure 3 – figure supplement 1C we provide a new graph (reproduced here) revealing that the levels of SCR1 do not significantly change under either heat shock or ethanol stress relative to the non-stressed control (0 min). One-way ANOVA analysis was performed for both HS and ES and p values were 0.094 and 0.083, respectively (calculated using GraphPad Prism 8).

      (5) Additionally, certain genes, such as TMA10 and SSA4, lack visible bars at time 0. Are these levels undetectable? The varying y-axis scales are confusing; presenting data as relative fold changes could offer a clearer perspective.

      Transcript levels for all genes evaluated here are detectable, even in the basal unstressed state. They are not visible on the histogram for certain genes at T= 0 due to the prodigious fold-increase in RNA elicited by heat shock.  However, to address this concern, we have added a bar graph inset displaying basal transcript levels for each gene in revised Figure 3. We reproduce data for SSA4 and TMA10 in the graphs below. In addition, we present transcript levels in new Figure 3 - figure supplement 1 for cells subjected to ethanol stress to allow a better appreciation of their increase over time. 

      Author response image 1.

      (6) Line 239: The evidence for chromatin compaction is unconvincing. An increase in H3 occupancy by ChIP might indicate a reduction in histone exchange dynamics but may not relate to overall chromatin compaction. The authors use H2A-mCherry to suggest a decrease in chromatin volume, but this data is not persuasive. Did the authors observe any changes in nuclear size? Perhaps quantifying chromatin compaction more directly, using signal intensity per volume, would be informative.

      To address this concern, we attempted to quantify integrated density for H2A-mCherry using Image J software. While the volume decreased for both stresses, the integrated density only increased for ethanol stress. We speculate that this may be due to photobleaching which has been reported for heat shock. The combination of heat and acidic pH contribute to loss of fluorescence signal (Alkaabi et al., 2005). While the integrated density supports the idea of global chromatin compaction in the ethanol stress condition, given the above concerns with the HS sample we elected to not present these data.

      (7) Line 340: The claim of a "strong spatiotemporal correlation" isn't evident from the data. Could correlation coefficients be provided? There is potential anti-correlation in Fig. 6 - Figure Supplement 1C.

      We thank the reviewer for this excellent suggestion. We now present an analysis of the correlation between HSP104 – HSP12 coalescence and HSP104 transcription for both HS and ES time courses, using single cell data of Figures 7D, 7E and Figure 7- suppl. 1D.  This analysis is presented in new Figure 7F.

      (8) Figure 8: The WT data in Fig 8 seem inconsistent with Fig. 4 (e.g. the interaction frequency for HSP104 and SSA2). Are these fluctuations between experiments, or are they side effects of IAA treatment? The use of ethanol as an IAA solvent vehicle raises concerns. It would be beneficial if the authors could demonstrate that 1.7% ethanol in the control does not induce ethanol stress.

      We acknowledge that there existed an inconsistency in the magnitude of intergenic interaction frequencies reported in the two experiments for HSP104 and SSA2. Some of this might be attributed to the fact that different strains were used, W303-1B in Figure 4 and LRY016 (W303-1B; LEU2::pGPD1osTIR1) in Figure 8. Nonetheless, in each experiment there was a prodigious fold-increase in interaction frequency over the no stress (T= 0 min) control for both HS and ES conditions and moreover, in each experiment the magnitude of this interaction was greater for the 2.5 min HS sample vs. the 10 min ES sample. However, to obviate this concern, we have removed the HSP104-SSA2 analysis from Figure 9 (corresponds to original Fig. 8).

      Regarding the second point, we cannot entirely rule out the concern that the 1.7% ethanol vehicle might impact 3C interaction frequencies. It is unlikely to be significant, however, given that most other pairwise tests evaluated in the two experiments (Figs. 5 and 9) resulted in similar 3C values. In particular, there was no consistent trend towards higher (or lower) interaction frequencies in the IAA experiment of Fig. 9.  

      Reviewer #3:  

      This is an interesting manuscript that builds off of this group's previous work focused on the interface between Hsf1, heat shock protein (HSP) mRNA production, and 3D genome topology. Here the group subjects the yeast Saccharomyces cerevisiae to either heat stress (HS) or ethanol stress (ES) and examines Hsf1 and Pol II chromatin binding, Histone occupancy, Hsf1 condensates, HSP gene coalescence (by 3C and live cell imaging), and HSP mRNA expression (by RT-qPCR and live cell imaging). The manuscript is well written, and the experiments seem well done, and generally rigorous, with orthogonal approaches performed to support conclusions…While identifying a mechanistic basis for the results [presented here] would be a tough task perhaps beyond the scope of this study, it would nevertheless be helpful to place these results in context with a series of other studies…importantly, this work left out PMID: 32015439 (HSF1 phase transition mediates stress adaptation and cell fate decisions) which is particularly relevant considering that it shows that it is human HSF1 condensate resolution rather than simple condensate formation that is associated with HSF1 transcriptional activity - which is similar to the findings here with this particular dose of HS resulting in resolution and high transcriptional activity versus ES resulting in resolution failure and lower activity. 

      We thank the Reviewer for pointing out this oversight. In this revision, we cite Gaglia et al., 2020 and several others reporting HSF1 foci formation in human cells exposed to heat shock. The single cell analysis of Gaglia et al argued that dissolution of large HSF1 foci (aka “nuclear stress bodies”), typically several µm in diameter and localized over satellite III DNA repeats (Jolly et al., 1997, 2002), correlates with HSP gene activation. Importantly, these condensates are postulated to act as reservoirs of HSF1, sequestered away from HSP genes (Gaglia et al., 2020).  In contrast, Zhang et al., 2022 has shown that human HSF1 inducibly forms small condensates (~300 nm) that localize over HSP genes and whose formation directly correlates with HSP gene activation (we discuss the Jolly, Gaglia and Zhang findings on lines 382-394). Likewise, our work shows that in yeast, Hsf1 inducibly forms small, dynamic clusters that colocalize with HSR genes within 2.5 min of exposure to elevated temperature; these dissolve ~20-60 min later (Figure 8 and Figure 8-supp. 1). In concert with Hsf1 condensate formation, HSR gene repositioning and transcription/ Pol II recruitment are likewise evident within 2.5 min. Therefore, in HS cells there exists coordinate induction of condensate formation, Pol II recruitment, transcription and intergenic interactions (for a detailed kinetic analysis of HSR gene interactions, see Figures 5 and 6 of Chowdhary et al, 2017).  This tight temporal relationship is absent in ethanol stressed cells (Figures 3, 4, 5, 6, 7, 8; summarized in Figure 10 and Table 1).

      It is also worth noting that the stresses themselves are quite different - ethanol can be used as a carbon source and so beyond inducing proteotoxic stress, the yeast are presumably adapting to this distinct metabolic state. Basically, it is not clear whether these differences are due to the dose of stress, versus we are looking at an early timepoint as ES initiates a genome-wide chromatin restructuring and gene expression reprogramming that goes beyond a response to proteotoxic stress. This reviewer is not suggesting a barrage of new experiments, but perhaps discussion points to contextualize results.

      We thank the reviewer for this suggestion and in our revised manuscript discuss these issues (lines 414424 and 486-498 [5% vs. 8.5% ethanol]; lines 500-519 [ethanol as a metabolite]).

      Recommendations for the authors:

      Reviewer #1:

      (1) In Figure 1E, the number of foci in control (0 min) cells is very different for the two conditions. Could the authors clarify/check this? Based on the mean numbers at time point 0, the control cells for the ethanol treatment already contain about 10-20 Hsp104 foci, compared to around 5 foci per cell in the control for heat shock.

      We thank the reviewer for raising this point and have repeated the assay with a different strain (DPY1561).  And as shown in Figure 1E, have confirmed that the control samples have similar number of foci.  

      (2) In the same Figure 1E, is the P-value relative to the control or the same time point in the other treatment? A comparison across treatments would be necessary to support the claim in lines 168-171 of the text.

      The statistical analysis (Mann Whitney test) was performed by comparing each stress timepoint to the no stress control. We clarify this in the figure legend. 

      (3) In Figure 1D, the heat-shock condition shows the same cells that are used in the control, but the cells in the ethanol-shock condition are different. This is a bit visually misleading compared to the experimental setup shown in panel 1C. The authors could show the control cells for the ethanol condition as well.

      We thank the reviewer for this excellent suggestion and have added the 0 min image for the ethanol stress conditions.

      (4) In Figure 7B adding images at 60min would help underscore the point that the condensates are stable in ethanol shocked cells.

      We appreciate this suggestion as well and have included a 60 min timepoint for both stresses (Figure 8B). 

      Reviewer #2:

      (1) Line 113: Has it not been established that yeast Hsf1 is constitutively trimeric?

      In yeast, only a fraction of Hsf1 is thought to be constitutively trimeric and it is this species that binds high-affinity HSEs even under non-stressful conditions (Giardina & Lis, 1995; Pincus et al., 2018). We have added this clarification to the text (lines 121-123). 

      (2) Ethanol can precipitate proteins, especially in rich media like YPD. Did the authors notice any protein precipitation? If yes, how do they account for effects due to nutrient loss by precipitation?

      This is an interesting point, but we did not notice any precipitates in either rich or synthetic liquid media containing 8.5% (v/v) ethanol for any of the time points used in the experiments.

      (3) Figure 3: The figure appears incomplete. Can enhancer, promoter, coding region, and 3'UTR be shown consistently for all genes examined?

      In response to this point, we have simplified this figure (new Fig. 4) by uniform presentation of factor occupancy at enhancer, promoter, and coding region loci for all but one of the genes evaluated. For HSP12 (330 bp), we were unable to distinguish promoter from coding region since the average sonicated chromatin fragment obtained using a Bioruptor is ~300 bp. Therefore, we evaluated only the HSP12 coding region for Pol II and histone H3 occupancy. 

      (4) Figure 4: The comparison between heat shock at 2.5 min and ethanol stress at later points is puzzling. Why not use consistent time points as in Fig. 3?

      Time points for the two stresses examined in this figure (new Fig. 5) were selected to represent times of peak intergenic interaction between HSR genes. These times were derived from our earlier analysis of 3C interactions during a heat shock time course (Figs. 5, 6 of Chowdhary et al., 2017) and ES data presented in this study, including Fig. 4 (Pol II ChIP time course) and Fig. 6 (3C time course). Data presented in Figs. 5 and 6 are consistent with the notion that intergenic interactions in cells subjected to ethanol stress are delayed relative to those observed in heat shocked cells, peaking in most cases at ~10 min (vs. ~2.5 min for heat stress (Chowdhary et al., 2017)).  

      (5) Figure 5: Fig. 5B top panel seems to show color inconsistencies for bars at 0 and 120 min. Also, the xaxis on the top left panel seems to have a typo; should it read "10," not "0?"

      We thank the reviewer for the observation. We changed the graphs in new Figure 6 to display the same color for all time points.  We also fixed the typo. 

      (6) Line 302: The evidence presented supports maximal mRNA levels, but the claim of "maximal transcription" requires support from nascent RNA analysis.

      We agree that RT-qPCR measures mRNA abundance, not nascent transcription. We have changed the text to refer to “transcript levels” where pertinent (lines 301-302; 1331-1332).

      (7) How long do loci remain coalescent during heat shock versus ethanol stress? Both 3C and imaging analyses do not differentiate between frequency and duration, which seems essential for understanding interaction dynamics.

      We thank the reviewer for this excellent question. In new Fig. 7D,E (data drawn from Fig. 6 – fig. suppl. 1), HSR gene coalescence detected in single cells over a HS or ES time course is charted.  Interpretable data exist for a small number of cells. Moreover, for both HS and ES states, in certain cells coalescence between the representative Hsf1 target genes HSP104 and HSP12 dissolves and then reappears. With this caveat in mind, the data suggest that HSP104-HSP12 coalescence can last at least 15 min in HS cells and up to 30 min in ES cells. We have not emphasized this point in the manuscript since a far more comprehensive analysis – beyond the scope of this study – is required.

      (8) For longer analyses, how do the authors accommodate potential ethanol concentration changes due to evaporation?

      For liquid cultures, we relied on maintaining minimal changes in the vapor pressure within the experimental vessel; to facilitate that, flasks were tightly covered to minimize evaporation and temperature was kept at 25°C. For most molecular analyses (RT-qPCR, ChIP, 3C), we confined our analysis to the first 60 min. For microscopy, the samples were encased within a concave slide, covered by a coverslip, as illustrated below. In addition, to tightly seal the coverslip on the slide we used petrolatum.  This arrangement minimized evaporation.

      Author response image 2.

      (9) Figure 9: This legend seems to have an incomplete sentence: "(represented using ...)."

      We have substituted an entirely new model in this revised manuscript (new Figure 10) that omits the use of an ellipsis. (We had used it to symbolize a delay in the appearance of HSR gene transcription in ES cells.)

      References  

      Alkaabi, K. M., Yafea, A., & Ashraf, S. S. (2005). Effect of pH on thermal- and chemical-induced denaturation of GFP. Applied Biochemistry and Biotechnology, 126(2), 149–156. https://doi.org/10.1385/ABAB:126:2:149

      Chowdhary, S., Kainth, A. S., & Gross, D. S. (2017). Heat Shock Protein Genes Undergo Dynamic Alteration in Their Three-Dimensional Structure and Genome Organization in Response to Thermal Stress. Molecular and Cellular Biology, 37(24), 1–23. https://doi.org/10.1128/mcb.00292-17

      Feder, Z. A., Ali, A., Singh, A., Krakowiak, J., Zheng, X., Bindokas, V. P., Wolfgeher, D., Kron, S. J., & Pincus, D. (2021). Subcellular localization of the J-protein Sis1 regulates the heat shock response. Journal of Cell Biology, 220(1), e202005165. https://doi.org/10.1083/JCB.202005165

      Gaglia, G., Rashid, R., Yapp, C., Joshi, G. N., Li, C. G., Lindquist, S. L., Sarosiek, K. A., Whitesell, L., Sorger, P. K., & Santagata, S. (2020). HSF1 phase transition mediates stress adaptation and cell fate decisions. Nature Cell Biology, 22(2), 151–158. https://doi.org/10.1038/s41556-019-0458-3

      Giardina, C., & Lis, J. T. (1995). Dynamic protein-DNA architecture of a yeast heat shock promoter. Molecular and Cellular Biology, 15(5), 2737–2744. https://doi.org/10.1128/mcb.15.5.2737

      Jolly, C., Konecny, L., Grady, D. L., Kutskova, Y. A., Cotto, J. J., Morimoto, R. I., & Vourc’h, C. (2002). In vivo binding of active heat shock transcription factor 1 to human chromosome 9 heterochromatin during stress. Journal of Cell Biology, 156(5), 775–781. https://doi.org/10.1083/jcb.200109018

      Jolly, C., Morimoto, R. I., Robert-Nicoud, M., & Vourc’h, C. (1997). HSF1 transcription factor concentrates in nuclear foci during heat shock: Relationship with transcription sites. Journal of Cell Science, 110(23), 2935–2941. https://doi.org/10.1242/jcs.110.23.2935

      Pincus, D., Anandhakumar, J., Thiru, P., Guertin, M. J., Erkine, A. M., & Gross, D. S. (2018). Genetic and epigenetic determinants establish a continuum of Hsf1 occupancy and activity across the yeast genome. Molecular Biology of the Cell, 29(26), 3168–3182. https://doi.org/10.1091/mbc.E18-060353

      Zhang, H., Shao, S., Zeng, Y., Wang, X., Qin, Y., Ren, Q., Xiang, S., Wang, Y., Xiao, J., & Sun, Y. (2022). Reversible phase separation of HSF1 is required for an acute transcriptional response during heat shock. Nature Cell Biology, 24(3), 340–352. https://doi.org/10.1038/s41556-022-00846-7

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      (1) In cardiac and renal transplantation, cold preservation in ice remains a common practice for transporting explanted hearts to donors which remains a cheap and easily accessible way of preserving organs. While ex-vivo mechanical circulatory platforms have been developed and are increasingly being utilized to prolong organ viability, cold preservation remains widely used. The authors perfused explanted hearts with oxygenated perfusion preservation devices at subnormothermic temperatures (20-23C) which is even much lower than routinely used in clinical cardiopulmonary bypass scenarios (28-32C) (in the discussion, the authors allude to SNC80's possible "protective effect" in cardiac bypass). It is unclear how much of the hypometabolic state is related to WB3 administration versus hypothermia. The study will benefit from a comparison of WB3 administration and hypothermia in Xenopus, explanted porcine organs versus cold preservation alone to show distinction in biostasis parameters.

      Indeed, we expect that both pharmaceutical interventions and cooling could contribute to a hypometabolic state. To assess this, the controls and the treated groups were exposed to the same temperatures for both the Xenopus (18C) and porcine heart experiments (20-23C). Therefore, we can conclude that any changes in the treatment group relative to control can be attributed to the introduction of SNC80 or WB3 and not from cooling alone.  

      (2) The authors selected SNC80 based on a literature survey where it was identified based on its ability to induce hypothermia and protect against the effects of spinal cord ischemia in rodents. While this makes sense, were other drugs (eg. Puerarin) considered? The induction of hypothermia and spinal cord protective effect of SNC80 may be multifactorial and not necessarily related to its biostatic effects as the authors describe. Please provide some more context into the background of SNC80.

      During our research program, we considered and tested other drugs (>100 existing compounds in Xenopus screens). Although the published hypothermic and tissue protective effects suggested to us that SNC80 should be included in screening, it was not until we observed effects across multiple test parameters, systems, and species that we honed in on SNC80 as a lead compound. We have added additional information to further clarify the background of SNC80 on pgs. 3-4. 

      (3) In most of the models, the primary metric that the authors utilize to characterize metabolic activity is oxygen consumption, which is a somewhat limited indicator. For instance, this does not provide any information, however, on anaerobic metabolic activity. In addition, the ATP/ADP ratio was found to decrease in the organ chips where SNC80 was utilized, but similar findings were not presented for the other models. 

      We thank reviewers for their important point. We have therefore added additional experiments, including the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC) used in the Organ Chip systems. We have added a description and an interpretation of the results in the section, Stasis induction in cultured human cells and tissues and mention the role of glycolysis and cytosolic reductive carboxylation as compensatory mechanisms.  Although the ATP/ADP ratio gave us useful insight into Huh-7 cells and chips metabolic activity, this method requires transfection and live imaging which does not suit other models such as Xenopus, or whole organs. Additionally, in animal models there may be other confounding factors that might influence ATP/ADP.

      (4) The authors should provide a more detailed explanation of SNC80's mechanisms of interaction with proteins related to transmembrane transport, mitochondrial activity, and metabolic processes. What is the impact of SNC80 on mitochondrial function, particularly ATP production and mitochondrial respiration? Are there changes in mitochondrial membrane potential, electron transport chain activity, or oxidative phosphorylation? In this context, the authors discuss the potential role of NCX1 as a binding target for SNC80 and its various mechanisms in slowing metabolism. However, no experiments have been done to confirm this binding in the present study. Coimmunoprecipitation studies using appropriate antibodies against SNC80 and NCX1 should be considered to demonstrate their direct binding. Additionally, surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) experiments could be employed to quantify the binding affinity between SNC80 and NCX1, providing further evidence of their interaction. These experiments would elucidate the binding mechanism between SNC80 and NCX1 and reveal more information on the mechanism of action for SNC80. 

      We agree that further definition of the mechanism of action is an important next step for this work; however, it is far beyond the scope of the present study.

      (5) The manuscript notes that histological analysis was conducted, but it seems that only example images are provided, such as Figure 4f. Quantified histological data would provide a more thorough understanding of tissue integrity. 

      We have added quantified histological data to the manuscript that was performed by a clinician blinded to the groups and interventions (Figure 4f).

      (6) Some of the points mentioned in the discussion and conclusion are rather strong and based on possible associations such as SNC80's potential vasodilatory capacity conferring a cardioprotective effect, and ability to reversibly suppress metabolism across different temperatures and species. Please tone this down and stay limited to the organs studied. Further, the reversibility of the findings may be more objectively assessed by biomarkers with decreased immunofluorescence in response to ischemia such as troponin I for the heart and albumin for the liver. Additionally, an investigation of proteins involved in inflammation, hypoxia, and key cell death pathways using immunohistochemistry analysis can better describe the impact of treatment on apoptosis/necroptosis. 

      We have revised aspects of the Discussion and Conclusion to focus on the organs studied in the present work (pgs. 14-17). We agree that markers of inflammation, hypoxia, and cell death are critical for assessing tissue health post-treatment. We performed PCR to assess such markers (Figure 4e) and found reductions in inflammatory cytokine and injury biomarker levels. Although we agree that immunohistochemistry may be useful, such as for looking at any spatial patterns of injury, PCR offers broader dynamic range and higher sensitivity and therefore was chosen for this assay.

      (7) What could be the underlying cause of the observed increase in intercellular spacing after SNC80 administration in porcine limbs which also seems to be evident in the heart histology samples? This seems to be more prominent in the SNC80 compared to the vehicle group. 

      Since the muscle bundle areas of baseline and treated tissues were essentially the same, the increase in intracellular space in the SNC80-treated tissue suggests a compensatory reduction in muscle fiber diameter.  Intracellular metabolite concentrations have been shown to be quite stable over a large range of metabolic activities (Hochachka et al. 1998). As such, a reduction in metabolic activity induced by SNC80 may suggest reduction in the accumulation of intracellular metabolites. In order to maintain a stable intracellular metabolite concentration, water would have to be expelled accounting for the increased intracellular space.

      P W Hochachka, G B McClelland, G P Burness, J F Staples, R K Suarez Comp Biochem Physiol B Biochem Mol Biol 120, 17–26 (1998).

      (8) In the Discussion section, it would be valuable to provide a concise interpretation of the lipidomic data, particularly explaining how changes in acylcarnitine and cholesterol ester levels may relate to tadpole metabolism, hibernation, or other biological processes. 

      An interpretation of the lipidomics data has been summarized in the Discussion (pg. 14).

      (9) What are the limitations or disadvantages of the study? Does SNC80 possess any immunomodulatory properties that might affect the outcomes of organ transplantation? Are there specific organs for which SNC80 may not be a suitable preservation agent, and if so, what are the reasons behind this? 

      This study is limited in two ways. The first is that we characterized the function of the donor pig heart outside of the body, and therefore future work will be required to verify the function and quality of the hearts after they have been transplanted. Secondly, SNC80 is not currently approved for use in clinical settings and during earlier pre-clinical trials of the drug, side effects including seizures were noted and its development was halted. It is hypothesized that these seizures are related to SNC80’s delta opioid activity, so we developed a new, non-opioid analog called WB3, which will be used in future work. We have added a description of the prior seizure findings to the text (pg. 5).

      Based on assessment of tissue biomarkers by PCR, it seems that SNC80 does exhibit immunomodulating properties. Because organ transplant recipients are treated with strong immunosuppressants to prevent organ rejection, we anticipate that SNC80 would either further support this goal, have little additional effect, or reduce the amount of additional immunosuppressive drugs that would need to be administered. To date, our data does not suggest that there are specific organs for which SNC80 may not be a suitable preservation agent.     

      Reviewer #2:

      (1) The authors developed an analog of a known delta opioid receptor activator SNC80 with three orders of magnitude lesser binding with the delta opioid receptor WB3. This will likely reduce the undesirable effects of SNC80 while preserving the metabolic slowing needed for organ preservation. Yet, most experiments were done with SNC80, not the superior modification, WB3, shown in only a limited set of experiments, Figure 3.  

      We included the WB3 studies in Xenopus to confirm that the biostatic activity is not mediated through the delta opioid receptor. We have only performed a limited number of experiments with WB3 because we are focused on improving its solubility so that it can be easily dissolved in common organ perfusates without DMSO, which we were able to use in the Xenopus experiments. 

      (2) The heart is one of the most challenging organs to preserve, and some experiments are done to establish the metabolic effects of SNC80. However, the biodistribution study, shown in Figure 2, conspicuously omitted the heart. 

      Thank you for this suggestion. We returned to the biodistribution study dataset and were able to measure uptake by the heart at the 1-hour time point. We observe an increase in uptake above levels observed for other tissues at 1 hour and at levels similar to the skeletal muscle at 2 hours (plot below). Unfortunately, the heart was not visible in a sufficient number of Xenopus tissue sections to reevaluate uptake at the 2-hour time point. We were also able to re-evaluate the lipidomics data for the heart. Acylcarnitine and cholesterol ester were not significantly different between vehicle and SNC80-treated groups. The lack of change in acylcarnitine is particularly important since its upregulation has been shown to be a marker for cardiovascular disease in humans (Deda et al. 2022). The expanded lipidomics data have been added to Figure 2.

      Deda O, Panteris E, Meikopoulos T, Begou O, Mouskeftara T, Karagiannidis E, Papazoglou AS, Sianos G, Theodoridis G, Gika H. Correlation of serum acylcarnitines with clinical presentation and severity of coronary artery disease. Biomolecules. 2022 Feb 23;12(3):354.

      Author response image 1.

      (3) I do not understand the design of the electrophysiology and contractility experiments with the porcine hearts. How did you defibrillate the hearts after removal and establishing perfusion? Lines 173-175 on Page 7 state: "After defibrillation with epinephrine, the P and QRS waveforms were visible in ECGs from 3 of 4 SNC80-treated hearts (Table S1), suggesting that those hearts regain atrial and ventricular polarization." Please clarify.

      Defibrillation is done with an electric shock. Also, please show the ECG recordings to support your conclusions about "polarization." What did you mean by "polarization"? Depolarization? Repolarization? Or resting potential. To establish a normal physiological state, please show ECG waveforms and present data on basic ECG characteristics: heart rate, PQ and QT intervals, and P and QRS durations. I recommend perfusion of the porcine heart with WB3, not only SNC80.  

      Hearts were defibrillated by the application of a 10 to 30 Joule electrical shock delivered from internal paddles positioned at the right atrium (negative) across to the left ventricle (positive). Once rhythm was established, 0.5 ml of 1:1000 epinephrine was administered via the aortic inflow. Electrocardiogram (ECG) showed that both vehicle and SNC80-treated hearts exhibited irregular contractions after perfusate flush and during rewarming prior to defibrillation. After defibrillation (10-30 J electrical shock) followed by epinephrine, a regular heartbeat was established in 3 of 4 SNC80-treated hearts, exhibiting normal P and QRS waveforms (Table S1). That observation suggested that the intrinsic atrial and ventricular muscle fiber contractility was preserved, and the overall conduction system of the heart was viable. The pulse rates of SNC80-treated hearts were at or near normal for porcine hearts (70-120 beats/min) after defibrillation. Vehicle-treated hearts exhibited tachycardia following defibrillation, with all exhibiting pulse rates above the normal range for porcine hearts. We have added clarifying text and definitions (pg. 8). We have only performed a limited number of experiments with WB3 because we are focused on improving its solubility so that it can be easily dissolved in common organ perfusates without DMSO, which we were able to use in the Xenopus experiments.

      (4) Pathology data also raises concerns. The histology images shown in Figure 4f are not quantified, and they show apparently higher levels of tissue disruption in SNC80-treated tissue vs vehicle-treated. The test (lines 169-171) confirms this concern: "In some hearts treated with SNC80, greater waviness of muscle fibers was observed, possibly indicating a state of muscle contraction."  

      The histology images shown in Figure 4f were quantified and the myocardial injury score quantification show comparable histology between the groups.

      (5) The apparent state of contracture suggests a higher degree of myocardial damage and a high intracellular calcium level in SNC80-treated hearts. 

      The authors suggested that the sodium-calcium exchanger NCX is a possible target of SNC80 and could be responsible for the "hypometabolic state." However, NCX1 is critically important in the extrusion of cytosolic Ca2+ during the diastolic phase. Failure to remove excessive calcium and restore ionic homeostasis would lead to calcium overload and heart failure. 

      The histological assessment doesn’t indicate a higher degree of myocardial damage in SNC80 treated hearts. Our data are not suggestive of high intracellular calcium buildup in SNC80treated hearts. If that were the case, we would have had challenges restoring the rhythm of the hearts on the Langendorff post-preservation, which was not observed.

      (6) I am surprised the authors did not consider using the gold standard assay for measuring mitochondrial function in cells by the Seahorse Cell Mito Stress Test. 

      Thank you for this important point. We have added data from the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC) included in the Organ Chip experiments. We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. We now mention the role of glycolysis and cytosolic reductive carboxylation as compensatory mechanisms.   

      Reviewer #3:

      (1) The authors perform a literature search to identify SNC80 as a promising hit. However, the details of the literature search, a list of other potential hits, and the criteria for identification of SNC80 are not described. The hypometabolic effect of SNC80 exposure is well-characterized in the Xenopus model. Furthermore, the authors show that SNC80 localises to the brain, but do not discuss several studies that have pointed to convulsions induced by exposure to high doses of SCN80, and whether this would be apparent in the Xenopus studies. The authors have promising data on the WB3 morpholino that retains or even improves on the hypometabolism phenotype of SCN80 while likely not retaining delta opioid activity. However, this is not functionally demonstrated. Moreover, WB3 is not used in any of the other assays and models used in the study. In the setting of cardiac transplant surgery, co-administration of SNC80 reduces metabolic activity and inflammation, although it is unclear if there is an improvement in recovery of organ function due to SCN80.

      Thank you for raising these important points. We have added details of the process to identify SNC80 (pgs. 3-4) and a discussion of the studies pointing to convulsions with high doses of SNC80 (pg. 5) (which were not observed in Xenopus studies). We have also incorporated measurements of oxygen consumption during WB3 treatment in Xenopus (Figure 3d).

      (2) The reversible induction of hypometabolic status is also demonstrated in two different organ chips. These models could identify the differential response of epithelial cells and vascular cells to drug perfusion, but the authors have mostly focused on the former. Finally, the authors identify specific targets for the hypometabolic effect of SNC80, which is a valuable resource for other screening studies and can form the basis for future work. 

      In the revised manuscript, we have also added data from the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC). We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. We highlight the differences in metabolic response from the four cell types to SNC80 treatment. It is important to note that the metabolism-suppressing effects of SNC80 were most potent in the epithelial cells that were originally derived from highly metabolic tumors (Caco-2 and Huh7) versus primary normal endothelial cells (HUVEC and LSEC), which is also consistent with past work suggesting that targeting of the NCX1 channel might offer a way to slow tumor growth (Wan et al. 2022). Because we observed more prominent effects in epithelial cells in 2D assays, we decided to focus the 3D organ chips assays on epithelial cells.

      Wan, H. et al. NCX1 coupled with TRPC1 to promote gastric cancer via Ca2+/AKT/β-catenin pathway. Oncogene (2022) doi:10.1038/s41388-022-02412-9.

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 136, "Based on these intriguing findings with human Organ Chips". No mention of human organ chips was made in the text at this point, suggest rewording.  

      Thank you for identifying this error. We have revised this line (pg. 6).

      (2) Please provide more information on previous studies that have explored other drugs for organ protection, the novelty of the findings of this study, and how the findings of this study compare to prior data. 

      Building on the background of organ preservation drugs provided in the Introduction, we have added details to compare our outcomes to other drugs explored for organ protection (pg. 15).

      (3) The dosing study in Supplemental Figure S1 provides some context on why the authors utilized the 100 uM SNC80 concentration. It would be helpful if the authors could elaborate in the Discussion on the mechanistic rationale for this concentration. 

      This dose was chosen to maximize suppression of metabolic and activity parameters, while ensuring reversibility of biostasis. We have clarified this in the Discussion (pg. 14).

      (4) In Supplement Figure S2a, the y-axis measures the relative metabolic rate. It seems from the text that this is a relative measure of oxygen consumption, but it should be clarified accordingly. 

      We have clarified this point in the Methods section.  

      (5) What is the specific time or time frame when the reversed effect of SNC80 is most pronounced or at its peak? 

      When Xenopus are moved to fresh medium after SNC80 treatment, we observe a 15-minute period during which no reversal is evident from motion measurements. After that period, we observe a gradual, linear recovery over 2 hours. We cannot designate a specific period in which the reversal effect is most pronounced from these data.

      (6) WB3 seems to show a faster and stronger impact on swimming in comparison to SNC80. What could be the potential reasons for this difference, and could this have any clinical implications? 

      From our current data, we understand the key difference to be that SNC80 has greater affinity for the delta opioid receptor compared to WB3. Therefore, we hypothesize that by not interacting with the opioid system, WB3 induces faster and stronger impacts on swimming. In mice, it has been shown that SNC80 directly inhibits forebrain GABAergic neurons via activity at their delta opioid receptors, which leads to convulsions (Chung et al. 2015). Although we do not observe seizure-like behavior in Xenopus, drugs that inhibit GABAergic neurons can produce stimulant effects in vivo. Since WB3 has a lower affinity for the delta opioid receptor, it likely produces less stimulation, leading to faster and stronger suppression of swimming behaviors. Additionally, it is possible that WB3 interacts with additional targets we have not yet identified.

      Chung PC, Boehrer A, Stephan A, Matifas A, Scherrer G, Darcq E, Befort K, Kieffer BL. Delta opioid receptors expressed in forebrain GABAergic neurons are responsible for SNC80-induced seizures. Behavioural brain research. 2015 Feb 1;278:429-34.

      (7) Elaborate on the potential significance of SNC80's distribution in the GI tract, gill region, and skeletal muscle. How might this distribution relate to the observed physiological effects? 

      In Xenopus tadpoles, we observe SNC80 uptake in the gill region and GI tract within 1 hour. The multiple possible routes of uptake in Xenopus (skin, gills, and mouth) may account for the relatively rapid physiological effects observed in our experiments. The uptake observed in the muscle may be specifically responsible for the slowed motion observed in Xenopus activity assays. This has been elaborated upon in the text (pg. 5).

      (8) Please use italics where needed, e.g., in vitro, in vivo, etc. 

      This has been updated throughout the article.

      (9) Supplemental Figure S1 - Is there any reason for having 3 replicates for the 100uM compared to the 4 replicates in the other groups? 

      Each group had 4 replicates; however, a review of the replicates for the 100 µM group suggested the presence of a leak or air bubble in one oxygen measurement vial, which, therefore, had to be excluded from the analysis.

      (10) Figure 3 description - 'c' should be bold. 

      Figure 3 has been updated.

      Reviewer #3:

      Title: The title suggests that several candidate compounds are identified but the study focuses primarily on SCN80. Please consider rephrasing to make it more specific to this molecule. Alternatively, the manuscript would be significantly strengthened if more data is provided for WB3. 

      Although the study focuses on SNC80, we introduce an entirely novel molecule, WB3, and therefore, we feel it is more appropriate to indicate that multiple molecules were studied.

      Line 58-59: please cite additional primary literature papers for the different therapeutics discussed. As an example, the authors do not cite or discuss Massen et al PMID: 31743376 which suggests that H2S is able to induce similar hypometabolic effects even at 37C. 

      Thank you for this suggestion. We have expanded our discussion of primary literature paper for the therapeutics discussed (pg. 15).

      Line 76 - 77: The authors do not provide any data on the other possible hits from their literature search or methods details on how this was done. No relevant literature has been cited. What criteria were used to finalise SNC80? 

      During our research program, we considered and tested other drugs (>100 existing compounds in Xenopus screens). Although the published hypothermic and tissue-protective effects suggested that SNC80 should be included in screening, it was not until we observed effects across multiple test parameters, systems, and species that we honed in on SNC80 as a lead compound. We have added additional information to further clarify the background of SNC80 on pgs. 3-4.  

      Line 85 and Lines 342-345 in the Discussion: SNC80 is reported to induce convulsions at high doses in rodents and primates - was this also evident in the Xenopus studies? How does the dose used in the Xenopus studies compare with the high dose (ca. 10 mg/kg) used in primate studies Danielson et al., PMID: 17112570? 

      We did not observe convulsions in SNC80-treated Xenopus. However, we have updated the manuscript to include previous observations of convulsions in rodents and primates treated with SNC80 (pg. 5). Due to a number of differences, it is challenging to directly compare the dosing in Xenopus studies to those in the primate. In the present study, groups of 10 Xenopus were exposed to a 10 mL pool of 100 µM SNC80, which may be absorbed via oral, gill, and skin routes. Primates were dosed with 10 mg/kg delivered intramuscularly. Because these models may result in different drug biodistributions, any direct comparisons would be speculative. Further work in rodent models may help clarify the relevant dosing differences.

      Line 117: what does 'double the concentration' mean? Is this with reference to the dose of SNC80? If so, is this sufficient to completely block opioid receptor activity? 

      Yes, we meant that naltrindole was dosed at double the concentration of SNC80. We have clarified this in the text (pg. 5). Prior work in rodent brain tissue has shown that radiolabeled naltrindole binds to saturation at picomolar to nanomolar concentrations (Yamamura et al. 1992). To confirm our initial observations with naltrindole and SNC80, we also tested a SNC80 analog (WB3) with very low delta opioid activity (Figure 3), which showed similar effects.

      Yamamura MS, Horvath R, Toth G, Otvos F, Malatynska E, Knapp RJ, Porreca F, Hruby VJ, Yamamura HI.

      Characterization of [3H] naltrindole binding to delta opioid receptors in rat brain. Life sciences. 1992 Jan 1;50(16):PL119-24.

      Figure 3c, d: It appears that WB3 is even more effective at rapidly reducing motion and inducing faster recovery which is an exciting result. However, in 3d it appears that longterm exposure of 8h has detrimental effects since the heart rate remains depressed. Please clarify. 

      Yes, at 8 hours, we observe slow recovery and, in some cases, maintenance of depressed heart rates. This could be because the drug is more lipophilic and might remain in fat tissue for longer times. Although our current goal is to lengthen the time window for heart transplant surgery to 6 hours, we are working on formulating WB3 to optimize safety for longer applications (8+ hours).

      Figure 4: the experiments with the heart transplants are well done, but do not demonstrate an additional protective effect over the current standard of care except for the reduced metabolism. Could the authors discuss this further in the discussion or provide data with WB83, which may show a stronger effect? Scale bars are missing in panel f.  

      In addition to reduced metabolism, we also demonstrate reduced expression of inflammation, hypoxia, and cell death-related markers compared to machine perfusion alone (Figure 4e). The potential protective effect of the biostasis-inducing compounds will be further investigated in a planned orthotopic porcine transplant study where pigs will be followed up for 6 hours post weaning off a bypass machine allowing enough time to assess potential benefit of biostasisinducing drugs. Additionally, we have added scale bars (Figure 4f).

      Order of manuscript: Line 136 already refers to the organ-chip data, which is only presented at the end. Please edit. I feel the manuscript would flow better with the organchip data presented before the heart transplant data. 

      Organ-chip data: this is an important component of the story but is only shown in supplementary figures. Consider showing this data in the main figures, as eLife has no space restrictions. Furthermore, it is unclear if the effluent collected and analysed is from apical or vascular, or both. In any case, the analysis via microscopy-based methods appears restricted to the epithelium. The manuscript would be significantly strengthened by providing some data on the effect of SNC80 on vascular cells. 

      As requested, we have moved the Organ Chips results to a main figure (new Fig. 5). We have added additional experiments, including the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC). We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. The 2D assays showed that metabolism-suppressing effects of SNC80 were most potent in the epithelial cells that were originally derived from highly metabolic tumors (Caco-2 and Huh7) versus endothelial cells (HUVEC and LSEC). Based on these results, we decided to focus the 3D organ chips assays on epithelial cells only, and hence only analyzed effluents from the epithelial (apical) channel.

      Methods section for fabrication of oxygen sensors: Please refer to prior papers from your lab (Grant et al., PMID: 35274118) with regards to details of the fabrication of the devices with inbuilt oxygen sensors. 

      The methods used for the fabrication of oxygen sensors will be included in a separate manuscript currently in preparation.  

      Figure S3 and Line 243-244: Please provide the data for untreated control organ chips in panels d and e a mean value for which is quoted in the main text. The images in panel f are too small for the reader to appreciate the point, please provide zooms. Scalebars are also missing from these images. Please increase the number of replicates for S3f - the liver-chip data has only two replicates which has very low power for statistical testing. In general, the number of organ chips used for the data for each panel is missing. 

      As mentioned in the captions, Figure S3 (now Figure S5) panels d and e show average albumin production of Liver Chips at day 7-10 of culture. These measurements were performed before any treatment with SNC80 to characterize the chip’s functional metabolism. In panel g, although we only show biological N=2-3, each datapoint corresponds to an average of multiple fields of view (multiple technical replicates). We have now clarified this in the figure legend.

      Figure S4 - I do not quite understand why the perfusion with the vehicle only also affects oxygen release in the liver chip. Is it possible to use a different vehicle? 

      The liver and gut oxygen levels are not on the same y-axis (gut on the left and liver on the right). The oxygen fold change of the liver control chip is below 0.5, which is in the same range as the gut control chip (0 +/- 0.25). There is a natural variation in oxygen consumption over the lifetime of the chips (now Figure 5c), and untreated cells are metabolically active and consuming oxygen. The small drop observed suggests that liver chips may not have reached a stable oxygen consumption rate at the time of the experiment, whereas the gut chips have stabilized.  

      Figure S5c-f: The units on the Y-axis are missing. 

      Panels S5c-d (now Figure S6c-d) depict the percent cytotoxicity and are thus unitless. Panels S5e-h (now Figure S6e-h) show the effluent levels relative to baseline and are also unitless. We have updated the figure caption to clarify this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Critical synopsis of the articles cited by referee 2:

      (1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      (2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli  chassis which uses Na+ instead of H+ for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      (3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K+ compared with 0.0000001 M of H+ in E. coli, so K+ is arguably a million times more important for the membrane potential than H+ and thus the electrophysiology!

      Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H+. This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K+) around!

      In our model Figure 4A is better explained by depolarisation due to K+ channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      (4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K+. The manuscript is incorrect as a result and I would not recommend publication.

      In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees

      Reviewer #1 (Public Review):

      Summary:

      Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:

      - The authors report original data.

      - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.

      - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.

      - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.

      - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:

      - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      >>We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:

      The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:

      The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+ channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      >>It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms  (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      >>ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      >>Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential  dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C). 

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Scientific recommendations:

      - Although Fig 4A clearly shows that light stimulation has an influence on the dynamics of cell membrane potential in the biofilm, it is important to rule out the contribution of variations in environmental parameters. I understand that for technical reasons, the flow of fresh medium must be stopped during image acquisition. Therefore, I suggest performing control experiments, where the flow is stopped before image acquisition (15min, 30min, 45min, and 1h before). If there is no significant contribution from environmental variations (pH, RedOx), the dynamics of the electrical response should be superimposed whatever the delay between stopping the flow stop and switching on the light.

      In this current research study, we were focused on studying how E. coli cells and biofilms react to blue light stress via their membrane potential dynamics. This involved growing the cells and biofilms, stopping the media flow and obtaining data immediately. We believe that stopping the flow not only helped us to manage data acquisition, it also helped us reduce the effect of environmental factors. In our future study we will expand the work to include how the membrane potential dynamics evolve in the presence of changing environmental factors for example such induced by stopping the flow at varied times.

      - Since TMRM signal exhibits a linear increase after the first response peak (Supplementary Figure 1D), I recommend mitigating the statement at line 78.

      - To improve the spatial analysis of the electrical response, I suggest plotting kymographs of the intensity profiles across the biofilm. I have plotted this kymograph for Video S3 and it appears that there is no electrical propagation for the second peak. In addition, the authors should provide technical details of how R^2(t) is measured in the first regime (Figure 7E).

      See the dedicated simulation article for more details. https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Line 152: To assess the variability of the latency, the authors should consider measuring the variance divided by the mean instead of SD, which may depend on the average value.

      We are happy with our current use of standard error on the standard deviation. It shows what we claim to be true.

      - Line 154-155: To truly determine whether the amplitude of the "action potential" is independent of biofilm size, the authors should not normalise the signals.

      Good point. We qualitatively compared both normalized and unnormalized data. Recent electrical impedance spectroscopy measurements (unpublished) indicate that the electrical activity is an extensive quantity i.e. it scales with the size of the biofilms.

      - To precise the role of K+ in the habituation response, I suggest using valinomycin at sub-inhibitory concentrations (10µM). Besides, the high concentration of CCCP used in this study completely inhibits cell activity. Not surprisingly, no electrical response to light stimulation was observed in the presence of CCCP. Finally, the Kch complementation experiment exhibits a "drop after the first peak" on a single point. It would be more convincing to increase the temporal resolution (1min->10s) to show that there is indeed a first and a second peak.

      An interesting experiment for the future.

      - Line 237-238: There are only two points suggesting that the dynamics of hyperpolarization are faster at higher irradiance(Fig 4A). The authors should consider adding a third intermediate point at 17µW/mm^2 to confirm the statement made in this sentence.

      Multiple repeats were performed. We are confident of the robustness of our data.

      - Line 249 + Fig 4E: It seems that the data reported on Fig 4E are extracted from Fig 4D. If this is indeed the case, the data should be normalised by the total population size to compare survival probabilities under the two conditions. It would also be great to measure these probabilities (for WT and ∆kch) in the presence of ROS scavengers.

      - To distinguish between model fitting and model predictions, the authors should clearly state which parameters are taken from the literature and which parameters are adjusted to fit the experimental data.

      - Supplementary Figure 4A: why can't we see any wavefront in this series of images?

      For the experimental data, the wavefront was analyzed by employing the imaris software. We systematically created a ROI with a curved geometry within the confocal stack (the biofilm). The fluorescence of ThT was traced along the surface of the curved geometry was analyzed along the z-axis.

      - Fig 7B: Could the authors explain why the plateau is higher in the simulations than in the biofilm experiments? Could they add noise on the firing activities?

      See the dedicated Martorelli modelling article. In general we would need to approach stochastic Hodgkin-Huxley modelling and the fluorescence data (and electrical impedance spectroscopy data) presented does not have extensive noise (due to collective averaging over many bacteria cells).

      - Supplementary Figure 4B: Why can't we see the second peak in confocal images?

      The second peak is present although not as robust as in Fig 2B. The confocal images were obtained with a laser source. Therefore we tried to create a balance between applying sufficient light stress on the bacterial cells and mitigating photobleaching.

      Editing recommendations:

      The editing recommendations below has been applied where appropriate

      - Many important technical details are missing (e.g. R^2, curvature, and 445nm irradiance measurements). Error bars are missing from most graphs. The captions should clearly indicate if these are single-cell or biofilm experiments, strain name, illumination conditions, number of experiments, SD, or SE. Please indicate on all panels of all figures in the main text and in the supplements, which are the conditions: single cell vs. biofilm, strains, medium, centrifugal vs centripetal etc..., where relevant. Please also draw error bars everywhere.

      We have now made appropriate changes. We specifically use cells when we were dealing with single cells and biofilms when we worked on biofilms. We decided to describe the strain name either on the panel or the image description.

      - Line 47-51: The way the paragraph is written suggests that no coordinated electrical oscillations have been observed in Gram-negative biofilms. However, Hennes et al (referenced as 57 in this manuscript) have shown that a wave of hyperpolarized cells propagates in Neisseria gonorrhoea colony, which is a Gram-negative bacterium.

      We are now aware of this work. It was not published when we first submitted our work and the authors claim the waves of activity are due to ROS diffusion NOT propagating waves of ions (coordinated electrical wavefronts).

      - Line 59: "stressor" -> "stress" or "perturbation".

      The correction has been made.

      - Line 153: Please indicate in the Material&Methods how the size of the biofilm is measured.

      The biofilm size was obtained using BiofilmQ and the step by step guide for using BiofilmQ were stated..

      - Figure 2A: Please provide associated brightfield images to locate bacteria.

      - Line 186: Please remove "wavefront" from the caption. Fig2B only shows the average signal as a function of time.

      This correction has been implemented.

      - Fig 3B,C: Please indicate single cell and biofilm on the panels and also WT and ∆kch.

      - Line 289: I suggest adding "in single cell experiments" to the title of this section.

      - Fig 5A: blue light is always present at regular time intervals during regime I and II. The presence of blue light only in regime I could be misleading.

      - Fig 5C: The curve in Fig 5D seems to correspond to the biofilm case. The curve given by the model, should be compared with the average curve presented in Fig 1D.

      - Fig 6A, B, and C: These figures could be moved to supplements.

      - Line 392: Replace "turgidity" with "turgor pressure".

      - Fig 7C,E: Please use a log-log scale to represent these data and indicate the line of slope 1.

      - Fig 7E: The x-axis has been cropped.

      - Please provide a supplementary movie for the data presented in Fig 7E.

      - Line 455: E. Coli biofilms do not express ThT.

      - Line 466: "\gamma is the anomalous exponent". Please remove anomalous (\gamma can equal 1 at this stage).

      - Line 475: Please replace "section" with "projection".

      - Line 476: Please replace "spatiotemporal" with "temporal". There is no spatial dependency in either figure.

      - Line 500: Please define Eikonal approximation.

      - Fig 8 could be moved to supplements.

      - Line 553: "predicted" -> "predict".

      - Line 593: Could the authors explain why their model offers much better quantitative agreement?

      - Line 669: What does "universal" mean in that context?

      - Line 671: A volume can be pipetted but not a concentration.

      - Line 676: Are triplicates technical or biological replicates?

      - Sup Fig1: Please use minutes instead of seconds in panel A.

      - Model for membrane dynamics: "The fraction of time the Q+ channel is open" -> "The dynamics of Q+ channel activity can be written". Ditto for K+ channel...

      - Model for membrane dynamics: "the term ... is a threshold-linear". This function is not linear at all. Why is it called linear? Also, please describe what \sigma is.

      - ABFDF model: "releasing a given concentration" -> "releasing a local concentration" or "a given number" but it's not \sigma anymore. Besides, this \sigma is unlikely related to the previous \sigma used in the model of membrane potential dynamics in single cells. Please consider renaming one or the other. Also, ions are referred to as C+ in the text and C in equation 8. Am I missing something?

      Reviewer #2 (Recommendations For The Authors):

      I have included all my comments as one review. I have done so, despite the fact that some minor comments could have gone into this section, because I decided to review each Result section. I thus felt that not writing it as one review might be harder to follow. I have however highlighted which comments are minor suggestions or where I felt corrections.

      However, while I am happy with all my comments being public, given their nature I think they should be shown to authors first. Perhaps the authors want to go over them and think about it before deciding if they are happy for their manuscript to be published along with these comments, or not. I will highlight this in an email to the editor. I question whether in this case, given that I am raising major issues, publishing both the manuscript and the comments is the way to go as I think it might just generate confusion among the audience.

      Reviewer #3 (Recommendations For The Authors):

      I was unable to find any legends for any of the supplemental videos in my review materials, and I could not open supplemental video 5.

      I made some comments in the public review about the analysis and interpretation of the time-to-fire data. One of the other challenges in this data set is that the time resolution is limited- it seems that a large proportion of cells have already fired after a single acquisition frame. It would be ideal to increase the time resolution on this measurement to improve precision. This could be done by imaging more quickly, but that would perhaps necessitate more blue light exposure; an alternative is to do this experiment under lower blue light irradiance where the first spike time is increased (Figure 4A).

      In the public review, I mentioned the possible impact of high membrane potential on PI permeability. To address this, the experiment could be repeated with other stains, or the viability of blue light-treated cells could be addressed more directly by outgrowth or colony-forming unit assays.

      In the public review, I mentioned the possible combined toxicity of ThT and blue light. Live/dead experiments after blue light exposure with and without ThT could be used to test for such effects, and/or the growth curve experiment in Figure 1F could be repeated with blue light exposure at a comparable irradiance used in the experiment.

      Throughout the paper and figure legends, it would help to have more methodological details in the main text, especially those that are critical for the interpretation of the experiment. The experimental details in the methods section are nicely described, but the data analysis section should be expanded significantly.

      At the end of the results section, the authors suggest a critical biofilm size of only 4 µm for wavefront propagation (not much larger than a single cell!). The authors show responses for various biofilm sizes in Fig. 2C, but these are all substantially larger. Are there data for cell clusters above and below this size that could support this claim more directly?

      The authors mention image registration as part of their analysis pipeline, but the 3D data sets in Video S6B and Fig. S4A do not appear to be registered- were these registered prior to the velocity analysis reported in Fig. 8?

      One of the most challenging claims to demonstrate in this paper is that these membrane potential wavefronts are involved in coordinating a large, biofilm-scale response to blue light. One possible way to test this might be to repeat the Live/Dead experiment in planktonic culture or the single-cell condition. If the protection from blue light specifically emerges due to coordinated activity of the biofilm, the Kch mutant would not be expected to show a change in Live/Dead staining in non-biofilm conditions.

      Line 140: How is "mature biofilm" defined? Also on this same line, what does "spontaneous" mean here?

      Line 151: "much smaller": Given that the reported time for 3D biofilms is 2.73 {plus minus} 0.85 min and in microclusters is 3.27 {plus minus} 1.77 min, this seems overly strong.

      Line 155: How is "biofilm density" characterized? Additionally, the data in Figure 2C are presented in distance units (µm), but the text refers to "areal coverage"- please define the meaning of these distance units in the legend and/or here in the text (is this the average radius?).

      Lines 161-162: These claims seem strong given the data presented before, and the logic is not very explicit. For example, in the second sentence, the idea that this signaling is used to "coordinate long-range responses to light stress" does not seem strongly evidenced at this point in the paper. What is meant by a long-range response to light stress- are there processes to respond to light that occur at long-length scales (rather than on the single-cell scale)? If so, is there evidence that these membrane potential changes could induce these responses? Please clarify the logic behind these conclusions.

      Lines 235-236: In the lower irradiance conditions, the responses are slower overall, and it looks like the ThT intensity is beginning to rise at the end of the measurement. Could a more prominent second peak be observed in these cases if the measurement time was extended?

      Line 242-243: The overall trajectories of extracellular potassium are indeed similar, but the kinetics of the second peak of potassium are different than those observed by ThT (it rises some minutes earlier)- is this consistent with the idea that Kch is responsible for that peak? Additionally, the potassium dynamics also reflect the first peak- is this surprising given that the Kch channel has no effect on this peak?

      Line 255-256: Again, this seems like a very strong claim. There are several possible interpretations of the catalase experiment (which should be discussed); this experiment perhaps suggests that ROS impacts membrane potential, but does not obviously indicate that these membrane potential fluctuations mitigate ROS levels or help the cells respond to ROS stress. The loss of viability in the ∆kch mutant might indicate a link between these membrane potential experiments and viability, but it is hard to interpret without the no-light control I mention in the public review.

      Lines 313-315: "The model predicts... the external light stress". Please clarify this section. Where this prediction arises from in the modeling work? Second, I am not sure what is meant by "modulates the light stress" or "keeps the cell dynamics robust to the intensity of external light stress" (especially since the dynamics clearly vary with irradiance, as seen in Figure 4A).

      Line 322: I am not sure what "handles the ROS by adjusting the profile of the membrane potential dynamics" means. What is meant by "handling" ROS? Is the hypothesis that membrane potential dynamics themselves are protective against ROS, or that they induce a ROS-protective response downstream, or something else? Later in lines 327-8 the authors write that changes in the response to ROS in the model agree with the hypothesis, but just showing that ROS impacts the membrane potential does not seem to demonstrate that this has a protective effect against ROS.

      Line 365-366: This section title seems confusing- mechanosensitive ion channels totally ablate membrane potential dynamics, they don't have a specific effect on the first hyperpolarization event. The claim that mechanonsensitive ion channels are specifically involved in the first event also appears in the abstract.

      Also, the apparent membrane potential is much lower even at the start of the experiment in these mutants- is this expected? This seems to imply that these ion channels also have a blue light independent effect.

      Lines 368, 371: Should be VGCCs rather than VGGCs.

      Line 477: I believe the figure reference here should be to Figure 7B, not 6B.

      Line 567-568: "The initial spike is key to registering the presence of the light stress." What is the evidence for this claim?

      Line 592-594: "We have presented much better quantitative agreement..." This is a strong claim; it is not immediately evident to me that the agreement between model and prediction is "much better" in this work than in the cited work. The model in Figure 4 of reference 57 seems to capture the key features of their data. Clarification is needed about this claim.

      Line 613: "...strains did not have any additional mutations." This seems to imply that whole genome sequencing was performed- is this the case?

      Line 627: I believe this should refer to Figure S2A-B rather than S1.

      Line 719: What percentage of cells did not hyperpolarize in these experiments?

      Lines 751-754: As I mentioned above, significant detail is missing here about how these measurements were made. How is "radius" defined in 3D biofilms like the one shown in Video S6B, which looks very flat? What is meant by the distance from the substrate to the core, since usually in this biofilm geometry, the core is directly on the substrate? Most importantly, this only describes the process of sectioning the data- how were these sections used to compute the velocity of ThT signal propagation?

      I also have some comments specifically on the figure presentation:

      Normalization from 0 to 1 has been done in some of the ThT traces in the paper, but not all. The claims in the paper would be easiest to evaluate if the non-normalized data were shown- this is important for the interpretation of some of the claims.

      Some indication of standard deviation (error bars or shading) should be added to all figures where mean traces are plotted.

      Throughout the paper, I am a bit confused by the time axis; the data consistently starts at 1 minute. This is not intuitive to me, because it seems that the blue light being applied to the cells is also the excitation laser for ThT- in that case, shouldn't the first imaging frame be at time 0 (when the blue light is first applied)? Or is there an additional exposure of blue light 1 minute before imaging starts? This is consequential because it impacts the measured time to the first spike. (Additionally, all of the video time stamps start at 0).

      Please increase the size of the scale bars and bar labels throughout, especially in Figure 2A and S4A.

      In Figure 1B and D, it would help to decrease the opacity on the individual traces so that more of them can be discerned. It would also improve clarity to have data from the different experiments shown with different colored lines, so that variability between experiments can be clearly visualized.

      Results in Figure 1E would be easier to interpret if the frequency were normalized to total N. It is hard to tell from this graph whether the edges and bin widths are the same between the data sets, but if not, they should be. Also, it would help to reduce the opacity of the sparse cell data set so that the full microcluster data set can be seen as well.

      Biofilm images are shown in Figures 2A, S3A, and Video S3- these are all of the same biofilm. Why not take the opportunity to show different experimental replicates in these different figures? The same goes for Figure S4A and Video S6B, which again are of the same biofilm.

      Figure 2C would be much easier to read if the curves were colored in order of their size; the same is true for Figure 4A and irradiance.

      The complementation data in Figure S3D should be moved to the main text figure 3 alongside the data about the corresponding knockout to make it easier to compare the curves.

      Fig.ure S3E: Is the Y-axis in this graph mislabeled? It is labeled as ThT fluorescence, but it seems that it is reporting fluorescence from the calcium indicator?

      Video S6B is very confusing- why does the video play first forwards and then backwards? Unless I am looking very carefully at the time stamps it is easy to misinterpret this as a rise in the intensity at the end of the experiment. Without a video legend, it's hard to understand this, but I think it would be much more straightforward to interpret if it only played forward. (Also, why is this video labeled 6B when there is no video 6A?)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Plasmacytoid dendritic cells (pDCs) represent a specialized subset of dendritic cells (DCs) known for their role in producing type I interferons (IFN-I) in response to viral infections. It was believed that pDCs originated from common DC progenitors (CDP). However, recent studies by Rodrigues et al. (Nature Immunology, 2018) and Dress et al. (Nature Immunology, 2019) have challenged this perspective, proposing that pDCs predominantly develop from lymphoid progenitors expressing IL-7R and Ly6D. A minor subset of pDCs arising from CDP has also been identified as functionally distinct, exhibiting reduced IFN-I production but a strong capability to activate T-cell responses. On the other hand, clonal lineage tracing experiments, as recently reported by Feng et al. (Immunity, 2022), have demonstrated a shared origin between pDCs and conventional DCs (cDCs), suggesting a contribution of common DC precursors to the pDC lineage.

      In this context, Araujo et al. investigated the heterogeneity of pDCs in terms of both development and function. Their findings revealed that approximately 20% of pDCs originate from lymphoid progenitors common to B cells. Using Mb1-Cre x Bcl11a floxed mice, the authors demonstrated that the development of this subset of pDCs, referred to as "B-pDCs," relied on the transcription factor BCL11a. Functionally, B-pDCs exhibited a diminished capacity to produce IFN-I in response to TLR9 agonists but secreted more IL-12 compared to conventional pDCs. Moreover, B-pDCs, either spontaneously or upon activation, exhibited increased expression of activation markers (CD80/CD86/MHC-II) and a heightened ability to activate T-cell responses in vitro compared to conventional pDCs. Finally, Araujo et al. characterized these B-pDCs at the transcriptomic level using bulk and single-cell RNA sequencing, revealing them as a unique subset of pDCs expressing certain B cell markers such as Mb1, as well as specific markers (Axl) associated with cells recently described as transitional DCs.

      Thus, in contrast to previous findings, this study posits that a small proportion of pDCs derive from B cell-committed lymphoid progenitors, and this subset of B-pDCs exhibits distinct functional characteristics, being less specialized in IFN-I production but rather in T cell activation.

      Strengths:

      Previously, the same research group delineated the significance of BCL11a as a critical transcription factor in pDC development (Ippolito et al., PNAS, 2014). This study elucidates the precise stage during hematopoiesis at which BCL11a expression becomes essential for the emergence of a distinct subset of pDCs, substantiated by robust genetic evidence in vivo. Furthermore, it underscores the shared developmental origin between pDCs and B cells, reinforcing prior research in the field that suggests a lymphoid origin of pDCs. Finally, this work attributes specific functional properties to pDCs originating from these lymphoid progenitors shared with B cells, emphasizing the early imprinting of functional heterogeneity during their development.

      Weaknesses:

      The authors delineate a subset of pDCs dependent on the BCL11a transcription factor, originating from lymphoid progenitors, and compare it to conventional pDCs, which they suggest differentiate from common DC progenitors of myeloid origin. However, this interpretation lacks support from the authors' data. Their single-cell RNA sequencing data identifies cells corresponding to progenitors (Prog2), from which the majority of pDCs, termed conventional pDCs, likely originate. This progenitor cell population expresses Il7r, Siglech, and Ly6D, but not Csfr1. The authors describe this progenitor as resembling a "pro-pDC myeloid precursor," yet these cells align more closely with lymphoid (Il7r+) progenitors described by Rodrigues et al. (Nature Immunology, 2018) and Dress et al. (Nature Immunology, 2019). Furthermore, analysis of their Mb1 reporter mice reveals that only a fraction of common lymphoid progenitors (CLP) express YFP, giving rise to a fraction of YFP+ pDCs. However, this does not exclude the possibility that YFP- CLP could also give rise to pDCs. The authors could address this caveat by attempting to differentiate pDCs from both YFP+ and YFP- CLPs in vitro in the presence of FLT3L. Additionally, transfer experiments using these lymphoid progenitors could be conducted in vivo to assess their differentiation potential in competitive settings.

      Dear Reviewer 1, we appreciate your thoughtful comments. We made the decision to address the Prog2 cluster as “pro-pDC myeloid precursor” because despite its lack of CSFR-1, its CIPR similarity score showed highest transcriptional similarity to the population “SC.CDP.BM” (GEO accession number: GSM791114), which is shown to be Sca1- Flt3+ cKitlo.

      A similar population identified as “common dendritic cell progenitor” is shown by Onai and colleagues (Onai et al. 2013, Immunity) to be capable of differentiating into pDCs by upregulating E2-2 and subsequently downregulating M-CSFR. In addition, we were unable to infer a developmental trajectory between Prog2 and B-pDCs using SimplePPT on Monocle3 (Figure 5B). Since we know our B-pDCs are CLP derived and most likely share a B cell progenitor population, we feel this lack of connectivity to the UMAP myeloid partition corroborates our assignment of Prog2 as a myeloid pDC progenitor (not CLP derived). Of note, recent work by Medina and colleagues has shown that while IL-7Rα knockout mice exhibit a block in B cell development at the all-lymphoid progenitor (ALP) stage, PDCA-1+ pDCs identified within the initially gated BLP population persisted (PLoS One, 2013), suggesting the IL7R chain is not required for the development of PDCA1+ cells. 

      Using their Mb1-reporter mice, the authors demonstrate that YFP pDCs originating from lymphoid progenitors are functionally distinct from conventional pDCs, mostly in vitro, but their in vivo relevance remains unknown. It is crucial to investigate how Bcl11a conditional deficiency in Mb1-expressing cells affects the anti-viral immune response, for example, using the M-CoV infection model as described by Sulczewski et al. in Nature Immunology, 2023. Particularly, the authors suggest that their B-pDCs act as antigen-presenting cells involved in T-cell activation compared to conventional pDCs. However, these findings contrast with those of Rodrigues et al., who have shown that pDCs of myeloid origin are more effective than pDCs of lymphoid origin in activating T-cell responses. The authors should discuss these discrepancies in greater detail. It is also notable that B-PDCs acquire the expression of ID2 (Figure S3A), commonly a marker of conventional/myeloid DCs. The authors could analyze in more detail the acquisition of specific myeloid features (CD11c, CX3CR1) by this B-PDCs subset and discuss how the expression of ID2 may impair classical pDC features, as ID2 is a repressor of E2-2, a master regulator of pDC fate.

      Both reviewers expressed the need to further investigate how Bcl11a conditional deficiency in Mb1-expressing cells affects anti-viral responses of B-pDCs. While the functional characterization of B-pDC in the context of infection could be highly informative, it is really outside the scope of the present study. Our discovery that B-pDCs expand robustly upon TLR-9 agonist challenges in vivo and can prime T cells in vitro efficiently, however, suggests that these cells might play an important role during viral infections or anti-cancer immunity.

      Finally, through the analysis of their single-cell RNA sequencing data, the authors show that the subset of B-pDCs they identified expresses Axl, confirmed at the protein level. Given this specific expression profile, the authors suggest that B-pDCs are related to a previously described subset of transitional DCs, which were reported to share a common developmental path with pDCs, (Sulczewski et al. in Nature Immunology, 2023). While intriguing, this observation requires further phenotypic and functional characterization to substantiate this claim.

      We agree with the reviewer’s comments. We are currently preparing a separate manuscript addressing the commonalities between human transitional DCs and murine non-conventional pDCs.

      Reviewer #2 (Public Review):

      Summary:

      The origin of plasmatoid dendritic cells and their subclasses continues to be a debated field, akin to any immune cell field that is determined through the expression of surface markers (relative to clear subclass separation based on functional biology and experimentation). In this context, in this manuscript by Araujo et al, the authors attempt to demonstrate that a subtype of pDCs comes from lymphoid origin due to the presence of some B cell gene expression markers. They nomenclature these cells as B-pDCs. Strikingly, pDCs function via expression of IFNa where as B-pDCs do not express IFNa - thereby raising the question of what are their physiological or pathophysiological properties. B-pDCs also express AXL, a marker not seen in mouse pDCs but observed in human pDCs. Overall, using a combination of gene expression profiling of immune cells isolated from mice via RNA-seq and single-cell profiling the authors propose that B-pDCs are a novel subtype of pDCs in mice that were not previously identified and characterized.

      Weaknesses:

      My two points of discussion about this manuscript are as follows.

      (1) How new are these observations that pDCs could also originate from common lymphoid progenitors. This fact has been previously outlined by many laboratories including Shigematsu et al, Immunity 2004. These studies in the manuscript can be considered new based on the single-cell profiling presented, only if the further characterization of the isolated B-pDCs is performed at the functional biology level. Overlapping gene expression profiles are often seen in developing immune cell types- especially when only evaluated at the RNA expression level- and can lead to cell type complexity (and identification of new cell types) that are not biologically and functionally relevant.

      Dear reviewer 2, we appreciate your thoughtful comments. We believe our single cell seq analysis adds new information to the studies mentioned because of our broader approach to BM profiling. By using only one marker (PDCA1+), scRNA-seq allowed us to dissect not only several subpopulations of pDCs that to our knowledge were not previously dissected in mice, but also linked the transcriptional similarity of B-pDCs to myeloid derived pDCs (and even other myeloid cell types), as well as B cells.

      (2) The authors hardly perform any experiments to interrogate the function of these B-pDCs. The discussion on this topic can be enhanced. Ideally, some biological experiments would confirm that B-pDCs are important.

      Dear reviewer 2, we appreciate your thoughtful comment and agree about the need for further functional characterization of B-pDCs (please see comments directed to reviewer 1 above).

      (1) Considering that Bcl11a conditional deficiency severely impacts the B cell lineage, there is a possibility that such an effect on B cells may indirectly influence pDC development. To address this, the authors could repeat their bone marrow transfer experiments in a competitive setting by mixing both Bcl11a WT and CKO BM cells (using congenic markers to track the origin of the BM cells) and then specifically assess whether BM cells originating from Bcl11a CKO donors have impaired pDC output.

      Dear reviewer 2, while the comment above is valid (that the reduced number of mature B cells in our Bcl11a conditional knockout might indirectly impact B-pDC development), we and many others have previously shown that lack of transcriptional regulation of E2-2 and other pDC differentiation modulators by Bcl11a  (including ID2 and MTG16) intrinsically and selectively disrupts the pDC lineage. At the current stage, we feel rederiving Bcl11a cKOs and performing bone marrow transfers (which usually take several months) only to investigate indirect effects of B cells on pDC developments is outside the scope of this publication.

      (2) As mentioned earlier, it is important to assess the potential of CLP, whether YFP- or YFP+, in their ability to give rise to pDCs both in vitro and in vivo. This is also crucial since the authors previously demonstrated that Bcl11a deficiency in all hematopoietic cells had a more drastic impact on pDC development than mb1-cre specific deficiency.

      We agree the manuscript could be strengthened by differentiation experiments. However, in our previous publication (mentioned above by the reviewer), we specifically show that although fewer overall LSK progenitors were detected in Vav-Cre+ F/F mice, both MDP and CDP progenitor populations persisted within the Flt3+ compartment in cKO mice at percentages similar to controls. MDP (Lin– Flt3+ Sca-1− CD115+ c-kithi); CDP (Lin– Flt3+ Sca-1− CD115+ c-kitlo). This data confirms that CLPs give rise to a substantial pool of pDC subpopulations. Other works have shown this as well, both in vivo and in vitro (Wang et al. Immunity 2004;  Karsunky et al, JEM 2003, etc). We therefore feel that confirming the previous observations that CLPs can give rise to pDCs is unnecessary, as our main goal in this manuscript was to describe a new pDC subpopulation that emerges primarily from CD79a+ B cell biased progenitors.

      (3) The authors show a more severe impact of Bcl11a CKO on pDC depletion in the spleen than in the BM. Is this effect specific to the spleen, or can it also be observed in lymph nodes? What is the overall impact of Bcl11a conditional deficiency on pDC distribution in tissues such as the liver and lung? These questions are important to address to understand whether the heterogeneity of pDCs is differentially affected by their localization.

      We agree heterogeneity of pDCs can be affected by their microenvironment. Although phenotyping of lymph nodes in Bcl11a cKOs would greatly add to our manuscript, the genetically altered strains required are no longer being bred in our facility and resurrecting them from frozen sperm is outside the realm of this publication.

      (4) Regarding the functional study of pDCs, as emphasized previously, it is important to assess the in vivo relevance of B-pDCs in infectious settings.

      Dear reviewer 2, we appreciate your thoughtful comment. Please see our response directed to reviewer 1 above.

      (5) The authors injected CpG-ODN into mice and analyzed pDC phenotype upon activation. It is important to note that upon activation, especially upon induction of IFN-I production in vivo, mPDCA1 expression is no longer specific to pDCs  (Blasius et al, Journal of Immunology, 2006). Therefore, to specifically characterize pDC phenotype upon activation, a differential gating strategy is required (CD11c, B220, Ly6C, and Siglec H) to ensure that bona fide pDCs are analyzed.

      We agree with the reviewer that this would be a more appropriate characterization. Regarding PDCA1 promiscuity in activated states, we are not aware of any cell types that express very high levels of B220 and PDCA1 simultaneously other than pDCs. We therefore firmly believe that our assignment is valid. Interestingly, gating B220+ cells of Cpg challenged mice that show intermediate expression of PDCA1 results in an increase in the frequency of CD19+ B cells, which we were careful to avoid by gating only the cells that most strongly express PDCA1.

      (6) How does pDC activation regulate their mb1 expression? Could conventional pDCs, upon activation, become B-PDCs? Could activation and induction of IFN-I production in vivo also affect CLP and increase the amount of YFP+ lymphoid progenitors and thus B-pDC output?

      Dear reviewer, we agree with your concern, albeit beyond the scope of the present study. While changes in YFP MFI via flow cytometry upon vaccination was not substantial, we have included the following comment in the manuscript discussion, acknowledging the aforementioned possibility: “Of note, whether induction of IFN-I production in vivo could also affect CLP and increase the amount of YFP+ lymphoid progenitors and thus B-pDC output is unclear. Further research is required to answer this question.”

      (7) If pDCs are preferentially expanding upon in vivo stimulation, it would be informative to assess their Ki67 profile. This is a surprising observation since pDCs are generally considered quiescent cells that were previously described to die in response to activation and IFN-I (Swiecki et al, Journal of Experimental Medicine, 2011).

      We agree and have entered the following statement to address this concern: “Functionally, they expand more readily after TLR9 engagement than classical pDCs (either through increased proliferation or differentiation of other cell types) and excel at activating T cells in culture.”

      (8) How does the conditional deficiency of BCL11a affect the production of IFN-I and IL-12 in vivo (serum) upon CpG-ODN stimulation?

      Dear reviewer 2, we are currently unable to rederive the conditional knockout mouse strain in a timely fashion. However, our ELISA experiments performed under controlled in vitro activation conditions, along with the in vivo findings of Zhang et al.(PNAS 2017) warrants the hypothesis that B-pDCs most likely exhibit a similar cytokine secreting profile under inflammatory conditions.

      (9) Given that B-PDCs show downregulation of pDC canonical markers, including IRF8 and TLR7, could the authors address how B-PDCs respond to TLR7 stimulation in vitro and assess a broader spectrum of cytokines produced by pDCs in response to such stimulation (IL-6, TNFa, CXCL10...)?

      Dear reviewer 2, although expanding our findings to include B-pDC responses to TLR-7 stimulation would greatly enhance our manuscript, a technical deterrent stands in our way. As mentioned prior, sorting B-pDCs for new experiments using reporter YFP mice is currently not possible, as we have retired this mouse strain. Sorting of live CD79a+ BpDCs via FACS is also not feasible, as CD79a staining with most antibody clones requires permeabilization of cells for easier access to the intra-membrane portion of CD79a.

      (10) It would be informative to compare scRNA sequencing data between control and Bcl11a CKO mice to ascertain their contribution to B-PDCs and whether this deficiency may affect other pDC clusters and/or progenitors.

      We are unable to sort B-pDCs for new experiments, as we unfortunately retired the transgenic colony.

      (11) Transitional DCs were reported to give rise to a subset of cDC2. Given that the authors claim that B-PDCs are related to this subset of transitional DCs, could the authors observe any YFP staining in cDC2 upon the generation of their BM chimeras?

      We saw no YFP positivity in CD11c hi cells (cDCs) via flow or through scRNA-seq, indicating CD79a expression is unique in mature B cells and B-pDCs.

      (12) Most of the statistical analysis is done with a student test. This requires a normal distribution of the sample which is highly unlikely given the size of the sample. Therefore, the authors shall rather use a non-parametric test (Mann Whitney) to compare their samples.

      We agree and have redone our statistical analyses using non-parametric test (Mann Whitney).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1)  In the subsets of the γδ T cells that exhibit reduced BLK expression in B6. SAP KO mice, have the authors examined the expression of Lck and/or Fyn? 

      The reviewer raises an excellent point. We have included in the revised manuscript additional data on Lck and Fyn expression in our scRNAseq dataset in (new Suppl. Fig. 1 and new Suppl. Fig. 4). These data revealed that in contrast to Blk, which appears primarily restricted to the γδT17 clusters, Lck and Fyn exhibit a much broader distribution and lack restriction to specific clusters. We did note that, like Blk, Lck and Fyn transcripts were abundant in SAP-dependent C2 cluster cells. Pseudobulk analysis on the immature clusters revealed that, neither Fyn nor Lck expression level differences reached our cut-off of 0.5 log2 FC (log2 FC Blk = 1.06), leading us to conclude that Blk is particularly dependent on SAP. We did note, however, that the magnitude of Lck differential expression was close to the 0.5 log2 FC cut-off and that its expression was increased in B6.SAP-/- γδ T cells (Suppl. Fig. 4). These results have been added to lines 202-212 in the Results section and lines 491-499 in the Discussion section.

      (2)  Does BLK directly associate with SLAM F1 and or SLAM F6 receptors? 

      The reviewer raises an interesting question given previous reports that BLK, LCK, and FYN have all been implicated in γδ T cell development. While SAP has a well-known ability to recruit FYN to SLAMF1 and there is evidence of a similar SAP-mediated recruitment of LCK to SLAMF6, we are not aware of any evidence a SAP-BLK interaction or of a direct binding of BLK to SLAM family receptors. Future experiments to investigate this possiibility are certainly warranted. In the revised ms, we have included additional discussion of these possibilities (lines 491- 499).  

      (3)  Given the emerging role of γδ T cells in host immunity, it would be useful if the authors could add a discussion of how their findings are relevant in disease conditions such as cancer. 

      We agree and have included new text in the Introduction (lines 37-45). 

      (4)  Delete repeated words in lines 546 and line 553. 

      Thank you—this has been corrected in the revised manuscript.

      Reviewer #2:

      This is a very complete study and requires no additional experimentation. One thing to keep in mind in assessing the ultimate fate of the "ab wannabe cells" is that mechanisms exist to silence the gd TCR as cells differentiate to the DP stage and so their presence as diverted DP cells may not be evident by staining for gdTCR expression - and will only be evident transcriptomically. 

      We appreciate this helpful comment from the reviewer which we will take into consideration in our future experimental design.

      There are a couple of minor points to raise: 

      (1)  Figure 3C is not called out in the text. 

      Thank you—this has been corrected in the revised manuscript.

      (2)  Line 546 - "dependent" is repeated.

      Thank you—this has been corrected in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This article by Navratna et al. reports the first structure of human HGSNAT in an acetyl-CoA-bound state. Through careful structural analysis, the authors propose potential reasons why certain human mutations lead to lysosomal storage disorders and outline a catalytic mechanism. The structural data are of good quality, and the manuscript is clearly written. This study represents an important step toward understanding the mechanism of HGSNAT and is valuable to the field. I have the following suggestions:

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function.

      We have addressed these concerns in the revised version and mentioned these efforts in our previous response letter. We’re briefly mentioning them here again. We attempted measuring HGSNAT catalyzed reaction by monitoring the decrease in acetyl-CoA in the presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA (gray) upon the addition of HGSNAT (red) (Rebuttal figure 1).

      Author response image 1.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 M acetyl-CoA was measured in presence of 10 M D-glucosamine and 30 nM HGSNAT at pH 7.5.

      While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active. In addition, we have shown by cryo-EM that GFP-tagged HGSNAT that we purified in detergent was already bound to the endogenous substrate ACO, an observation that has been observed by Xu et al., as well. Finally, we performed LC-MS on GFP-tagged HGSNAT purified in detergent to detect bound ACO, which could be further removed by dialysis. These results have been included in Figure S9. The endogenous binding of ACO to HGSNAT in detergent suggests that neither the tag nor detergent are detrimental to the function.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer?

      We have already changed this figure in our latest submission. Perhaps the changes made were not obvious while reviewing. We agreed with this reviewer that the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. In the absence of data supporting large movements during the acetyl transfer reaction, old Figure 5 appeared speculative. Hence, we have edited Figure 5 in the revised version of the manuscript based on the observations we made in this study, and different states shown in the figure do not show any conformational changes and only depict acetyl transfer.

      Reviewer #2 (Public Review):

      Summary:

      This work describes the structure of Heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT), a lysosomal membrane protein that catalyzes the acetylation reaction of the terminal alpha-D-glucosamine group required for degradation of heparan sulfate (HS). HS degradation takes place during the degradation of the extracellular matrix, a process required for restructuring tissue architecture, regulation of cellular function and differentiation. During this process, HS is degraded into monosaccharides and free sulfate in lysosomes.

      HGSNAT catalyzes the transfer of the acetyl group from acetyl-CoA to the terminal non-reducing amino group of alpha-D-glucosamine. The molecular mechanism by which this process occur has not been described so far. One of the main reasons to study the mechanism of HGSNAT is that multiple mutations spanning the entire sequence of the protein, such as, nonsense mutations, splice-site variants, and missense mutations lead to dysfunction that causes abnormal accumulation of HS within the lysosomes. This accumulation is a cause of mucopolysaccharidosis IIIC (MPS IIIC), an autosomal recessive neurodegenerative lysosomal storage disorder, for which there are no approved drugs or treatment strategies.

      This paper provides a 3.26A structure of HGSNAT, determined by single-particle cryo-EM. The structure reveals that HGSNAT is a dimer in detergent micelles, and a density assigned to acetyl-CoA. The authors speculate about the molecular mechanism of the acetylation reaction, map the mutations known to cause MPS IIIC on the structure and speculate about the nature of the HGSNAT disfunction caused by such mutations.

      Strengths:

      The paper describes a structure of HGSNAT a member of the transmembrane acyl transferase (TmAT) superfamily. The high-resolution of a HGSNAT bound to acetyl-CoA is important for our understanding of HGSNAT mechanism. The density map is of high-quality, except for the luminal domain. The location of the acetyl-CoA allows speculation about the mechanistic role of multiple residues surrounding this molecule. The authors thoroughly describe the architecture of HGSNAT and map the mutations leading to MPS IIIC.

      Reviewer #3 (Public Review):

      Summary:

      Navratna et al. have solved the first structure of a transmembrane N-acetyltransferase (TNAT), resolving the architecture of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state using single particle cryo-electron microscopy (cryoEM). They show that the protein is a dimer, and define the architecture of the alpha- and beta-GSNAT fragments, as well as convincingly characterizing the binding site of acetyl-CoA.

      Strengths:

      This is the first structure of any member of the transmembrane acyl transferase superfamily, and as such it provides important insights into the architecture and acetyl-CoA binding site of this class of enzymes.

      The structural data is of a high quality, with an isotropic cryoEM density map at 3.3Å facilitating building of a high-confidence atomic model. Importantly, the density for the acetyl-CoA ligand is particularly well-defined, as are the contacting residues within the transmembrane domain.

      The structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional characterization of the reaction cycle of this class of enzymes.

      Weaknesses:

      While the structural data for the state presented in this work is very convincing, and clearly defines the binding site of acetyl-CoA, to get a complete picture of the enzymatic mechanism of this family, additional structures of other states will be required.

      A weakness of the study is the lack of functional validation. The enzymatic activity of the enzyme characterized was not measured, and the enzyme lacks native proteolytic processing, so it is a little unclear whether the structure represents an active enzyme.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      In the response to reviewers, the authors mention revised coordinates, but the revised coordinates provided to this reviewer do not reflect the stated changes (I assume a technical error somewhere)

      Perhaps, the old coordinates in the deposition system were resubmitted with the revised draft. Nevertheless, we have made the changes suggested by this reviewer to structure in the previous round and have released the new coordinates (PDB ID: 8TU9).

      Is there any evidence for the interprotomer disulfide except for the map? e.g. if it is a disulfide-linked dimer, one should see a shift in mobility on non-reducing vs reducing SDS-PAGE. Without this, the evidence from the map is not conclusive - while the symmetry-related cysteines are nearby to one another, based on the map I could argue that they could just as well be modeled with the cys sidechains reduced and pointing away from one another.

      In addition to building the density based on cryo-EM maps, we have performed FSEC-based thermal melt analysis of the Ala mutation of C334 that is involved in disulfide at the dimer interface. C334A is still expressed as a dimer, suggesting that C334A is not the only residue stabilizing the dimer. Upon heating the detergent-solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Figure 4-Figure supplement 1 in main manuscript). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer. We have also performed PAGE analysis as suggested by this reviewer and noticed that reducing conditions result in a monomeric protein band (Rebuttal figure 2). While we were revising this manuscript, two other groups published structures of HGSNAT (Xu et al., 2024, Nat. Struct Mol Biol, and Zhao et al., 2024, Nat. Comm). These groups have also identified this disulfide at the dimer interface in their HGSNAT structures. Zhao et al. showed that this disulfide is not crucial for dimerization and also suggested that it can break depending on the conformation of HGSNAT. Our FSEC results agree with this observation.

      Author response image 2.

      Comparison of purified HGSNAT on native and reducing SDS-PAGE. The arrows on both the gels indicate N-GFP-HGSNAT. The two bands on the SDS PAGE are, perhaps, two differentially glycosylated forms of HGSNAT.


      The following is the authors’ response to the original reviews.

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function. The authors would need to establish an in vitro assay using purified protein and assess the level of Acetyl-CoA in the reaction (there are commercial kits and a long list of literature showing how to measure this). They could also follow the HS acetylation reaction by e.g. HPLC-MS or NMR (among other methods).

      The cryo-EM sample was prepared without the exogenous addition of ligand, as noted in the manuscript. However, we see that acetyl-CoA was intrinsically bound to the protein, indicating the ability of GFP-tagged HGSNAT protein to bind the ligand. Upon dialysis, we see release of acetyl-CoA from the protein, which we have confirmed by LC-MS analysis (Fig S9). We purified the protein at a pH optimal for acetyl-CoA binding, as suggested by Bame, K. J. and Rome, L. H. (1985) and Meikle, P. J. et al., (1995). Because we see acetyl-CoA in a structure obtained using a GFP fusion, we argue that GFP does not interfere with protein stability and ability to bind to the co-substrate. As demonstrated by existing literature HGSNAT catalyzed reaction is compartmentalized spatially and conditionally. The binding of acetyl-CoA happens towards the cytosol and is optimal at pH 7-0.8.0, while the transfer of the acetyl group to heparan sulfate occurs towards the luminal side and is optimal at pH 5.0-6.0. We attempted measuring HGSNAT catalyzed reaction by monitoring decrease in acetyl-CoA in presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA in the presence of HGSNAT-ACO complex (blue) and apo HGSNAT (red); the difference compared to the ACO standard (gray) was not significant. While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active.

      Author response image 3.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 mM acetyl-CoA was measured in presence of 10 mM D-glucosamine and 30 nM HGSNAT at pH 7.5.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer? The speculative nature of this assumption needs to be clearly acknowledged throughout the manuscript and discussed in more detail. The authors could use HDX-MS or introduce cysteine residues in the hypothetical inward- and outward-facing cavities and test accessibility by incubating the purified protein with maleimides or other agents reacting with free cysteine.

      We thank the reviewers for this insightful critique. Yes, the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. We also agree with the reviewer that HDX-MS could be the best way to monitor the substrate-induced conformational dynamics within HGSNAT experimentally. In the absence of data supporting large movements during the acetyl transfer reaction, figure 5 is speculative. We have now edited Figure 5 in the revised version of the manuscript based on the observations we made in this study.

      (3) The acetyl-CoA-bound state is described as the open-to-lumen state. Indeed, from Figure 1C, the lumen opening appears much larger than the cytosol opening. Is there any small tunnel that connects the substrate site to the cytosol? In other words, is this state accessible to both the lumen and the cytosol, albeit with a larger opening toward the lumen? This question arises because, in Figure S5, the tunnel calculated by MOLE seems to also connect to the cytosol.

      Yes, it is likely that the ACOS is accessible via lumen and cytosol to varying degrees, as evidenced by MOLE prediction. However, binding of the bulky nucleoside head group of acetyl-CoA at ACOS blocks the cytosolic entrance in the confirmation discussed in this manuscript. MOLE prediction was performed on a structure devoid of acetyl-CoA, and it is possible that the protein doesn’t essentially undergo isomerization between open-to-lumen and open-to-cytosol confirmations during acetyl transfer. Likely, ACOS is always accessible from both the lumen and cytosol, but depending on the substrates or products bound, the accessibility could be limited to either the lysosomal lumen or cytosol. We have rewritten all the statements mentioning an open-to-lumen confirmation to reflect this argument.

      (4) The authors state, "Interestingly, in most of the detergent conditions we tested, HGSNAT was predominantly dimeric (Fig S1C-H)," and also mention, "In all the detergents we tested, HGSNAT eluted as a dimer, a testament to the extensive side-chain interaction network." The dimerization is said to be mediated by a disulfide bond. I would be surprised if the detergents the authors tested could break a disulfide bond. Therefore, can this observation truly serve as a testament to an "extensive" side-chain interaction network?

      We agree with the reviewer that detergents are unlikely to break a disulfide bond. To address this comment, we generated a C334A mutant of HGSNAT and extracted it from cells in 1% digitonin. It is still expressed as a dimer (Fig S8E). However, upon heating the detergent solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Fig S8I and S8K). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer.

      (5) Apart from the cryo-EM structure, the article does not provide any other experimental evidence to support or explain a molecular mechanism. Due to the complete absence of functional assays, mutagenesis analysis, or other structures such as a ternary complex or an acetylated enzyme intermediate, the mechanistic model depicted in Figure 5 should be taken with caution. This uncertainty needs to be clearly described in the manuscript text. Performing additional mutagenesis experiments to test key hypotheses, or further discussing relevant data from the literature, would strengthen the manuscript.

      We agree with the reviewer on the lack of supporting evidence for the mechanistic models proposed in Fig 5. They were made based on previously reported biochemical characterization of HGSNAT by Rome & Crain (1981), Rome et al. (1983), Miekle et al. (1995), and Fan et al. (2011). However, we agree with the reviewer that this schematic is not experimentally proven and is speculative at best. We have edited Figure 5 in the revised version of the manuscript. In addition, we have also performed mutagenesis analysis to study the stability of mutants (Fig S8) and performed LC-MS analysis to identify endogenously bound acetyl-CoA (Fig S9) to strengthen parts of the manuscript. We have discussed our findings in the results and modified the discussion according to these suggestions.

      (6) It is discussed that H269 is an essential residue that participates in the acetylation reaction, possibly becoming acetylated during the process. However, there is no solid experimental evidence, e.g. mutagenesis analysis or structural analysis, in this or previous articles, that demonstrates this to be the case. Providing more information, ideally involving additional experimental work, would strengthen this aspect of the mechanism that is proposed. This would require establishing an in vitro assay, as described in 1).

      H269, as a crucial catalytic residue, was suggested by monitoring the effect of chemical modifications of amino acids on acetylation of HGSNAT membranes by Bame, K. J. and Rome, L. H. (1986). We generated N258I and H269A mutants of HGSNAT and analyzed their stability. We noticed a greater destabilization in N258I compared to H269A (Fig S8). We believe this is because of the loss of ability to bind acetyl-CoA, as the TMs around a catalytic core of the protein in our cryo-EM structure were stabilized by interactions with acetyl-CoA. Recently, Xu et al. (2024, Nat Struct Mol Biol) suggested that they do not observe acetylated histidine in their structure. However, our structure and that reported by Xu et al. (2024) are obtained at cytosolic pH. Perhaps, acetylation of H269 occurs at acidic lysosomal pH. Extensive structural and catalytic investigation of HGSNAT at low pH is required to rule out H269 acetylation as a step in the HGSNAT catalyzed reaction.

      (7) In the discussion part, the authors mention previous studies in which it was postulated that the catalytic reaction can be described by a random order mechanistic model or a Ping Pong Bi Bi model. However, the authors leave open the question of which of these mechanisms best describes the acetylation reaction. The structure presented here does not provide evidence that could support one mechanism or the other. The authors could explore if an in vitro experimental measurement of protein activity would provide any information in this regard.

      We agree with the reviewer that a more detailed kinetic analysis is necessary to define the bisubstrate reaction mechanism of HGSNAT. All the existing structural data on two isoforms of HGSNAT is obtained at basic pH. As a result, the existing structures do not unambiguously demonstrate the bisusbtrate mechanism of HGSNAT. We believe low pH structural characterization and a detailed kinetic and structural characterization of HGSNAT in membrane mimetics like nanodiscs could provide more insights into the mechanism. However, these studies are a future undertaking and are not a part of this manuscript.

      (8) Although the authors map the mutations leading to MPS IIIC on the structure and use FoldX software to predict the impact of these mutations on folding and fold stability, there is no experimental evidence to support FoldX's predictions. It would be ideal if an additional test for these predictions were included in the manuscript. The authors could follow the unfolding of purified mutants by SEC, FSEC, or changes in intrinsic fluorescence to assess protein stability.

      As suggested here, we prepared HGSNAT MPSIIIC variants and tested their expression and stability (please see Fig S8). These results have been included in the revised version of the manuscript.

      (9) Some sidechains that have quite strong sidechain density are missing atoms. I would be particularly careful with omitting sidechains that pack in the hydrophobic core, as this can tend to artificially reduce the clash score. Check F81, L62, P91 and V87, for example.

      We have revisited the modeling of these regions and deposited new coordinates.

      (10) W316 seems to have the wrong rotamer.

      This has been corrected in the new coordinate file that has been released.

      (11) N134 and N433 seem to have extra density. Are these known glycosylation sites?

      As per Hrebicek M. et al., 2006 and Feldhammer M. et al., 2009, there are five predicted glycosylation sites: N66, N114, N134, N433, and N602. However, we see evidence for NAG density at N114, N134, and N433. These have now been modeled in the structure.

      (12) At the C-terminal residue (Ile-635), the very C-terminal carboxylate is modeled pointing to a hydrophobic environment. It seems more likely to me that the Ile sidechain is packing here, with the C-terminal carboxylate facing the solvent.

      Thank you for pointing this out. We have edited the orientation of the Ile sidechain accordingly.

      Presentation and wording of results/methods:

      - Figure S3 legend "At places with missing density, the side chains were trimmed to C- alpha" - this is incorrect, I think the authors mean C-beta.

      We have corrected this error in the revised version of the manuscript.

      - Figure S3 legend - the authors refer to a gray mesh, where a transparent surface is displayed.

      Thanks for pointing this error out. We have corrected this in the revised version.

      - Some colloquial/vague wording in the main text (a lot of sentences starting with "Interestingly, ...". Making the wording more specific would help the reader I think.

      We have edited out ‘interestingly’ from the document and have re-written parts of the manuscript, per reviewers’ suggestion, for brevity.

      - Figure S2 legend, "throughout the processing workflow the resolution of luminal domain was used as a guidepost" - it is not entirely clear to me what this means in this context, perhaps revise the wording?

      We have rephrased this line in the revised draft of the manuscript.

      - Figure S2 and methods, Local refinements of LD and TMD are mentioned, but not indicated on the processing workflow.

      We have included a new Fig S2 & edited the legend, including these changes, per the reviewers’ suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers (minor points):

      We thank all reviewers for their very helpful suggestions and greatly appreciate their positive evaluation of our work.

      Reviewer #1:

      Ad 1) The reviewer states: Fig 5 While the data very nicely show that CPX and Syt1 have interdependent interactions in the chromaffin neurons, this seems to be not the case in neurons, where the loss of complexins and synaptotagmins have additive effects, suggesting independent mechanisms (eg Xue et al., 2010). This would be a good opportunity to discuss some possible differences between secretion in endocrine cells vs neurons.

      We greatly appreciate the insightful suggestion by the reviewer. To accommodate the reviewer’s suggestion, we now discuss this issue on page 21, line 486-491: “In murine hippocampal neurons, loss of CpxI and Syt1 has additive effects on fast synchronous release, suggesting independent mechanisms (Xue et al., 2010). On the other hand, the same study also showed that Syt1 heterozygosity fails to reduce release probability in wild-type neurons, but does so in the absence of Cpx, again suggesting that Cpx and Syt1 may functionally interact in Ca2+-triggered release.”

      Ad 2) The reviewer states: Fig 8 Shows an apparent shift in Ca sensitivity in N-terminal mutants suggesting a modification of Ca sensitivity of Syt1. Could there be also an alternative mechanism, that explains this phenotype which is based on a role of the n-term lowering the energy barrier for fusion, that in turn shifts corresponding fusion rates to take place at lower Ca saturation levels?

      We fully agree with the reviewer. While our data indicate that Cpx and Syt1 act in a dependent manner in accelerating exocytosis, they do not provide decisive evidence that the NTD of CpxII directly modulates the Ca2+ affinity of Syt1, an issue that we discuss on page 23 , line 523529: ”The results favor a model wherein the CpxII NTD either directly regulates the biophysical properties of the Ca2+-sensor by increasing the apparent forward rate of Ca2+-binding or indirectly affects SytI-SNARE or SytI-membrane interactions, thereby, lowering the energy barrier of Ca2+triggered fusion.”

      Reviewer #2:

      Ad 1) The reviewer states: The authors provide a "chromaffin cell-centric" view of the function of mammalian Cplx in vesicle fusion. With the exception of mammalian renal ribbon synapses (and some earlier RNAi knockdown studies that had off-target effects), there is very little evidence for a "fusion-clamp"-like function of Cplxs in mammalian synapses. At conventional mammalian synapses, genetic loss of Cplx (i.e. KO) consistently decreases AP-evoked release, and generally either also decreases spontaneous release rates or does not affect spontaneous release, which is inconsistent with a "fusion-clamp" theory. This is in stark contrast to invertebrate (D. m. and C. e.) synapses where genetic Cplx loss is generally associated with strong upregulation of spontaneous release, providing support for Cplx acting as a "fusion-clamp".

      We agree with the reviewer that it is difficult to reconcile contradictory findings regarding the role of Cpx in membrane fusion in vertebrates and invertebrates or between murine hippocampal neurons and neuroendocrine cells. On the other hand, we respectfully disagree with the statement of providing a "chromaffin cell-centric" view of the function of mammalian Cplx in vesicle fusion. In fact, a large number of model systems (in vitro and in vivo studies) support a scenario where complexin takes center stage in clamping of premature vesicle release. For example, in vitro analyses using a liposome fusion assay (Schaub et al., 2006, Nat Struct Mol Biol 13, 748; Schupp et al., 2016) or Hela cells that ectopically express “flipped” SNAREs on their cell surface (Giraudo et al., 2008, JBC 283, 21211) showed that complexin can inhibit the SNARE-driven fusion machinery. Likewise, several studies boosting complexin action by either genetic overexpression or peptide supplementation have provided evidence for the complexin clamp function in neuronal and nonneuronal cells (e.g. Itakura et al., 1999, BBRC 265, 691; Liu et al., 2007, Biochemistry 72, 439; Abderrahmani et al., 2004, J Cell Sci 117, 2239; Archer et al., 2002, JBC 277, 18249; Tang et al, 2006,

      Cell 126, 1175; Vaithianathan et al., 2013, J Neurosci 33, 8216; Roggero et al., 2007, JBC, 282, 26335.)

      In addition, chromaffin cells enable the investigation of secretion on the background of a well-defined intracellular calcium concentration. Indeed, CplxII knock-out in chromaffin cells demonstrated an enhanced tonic release which is evident at elevated levels of [Ca]i (>100nM), but absent at low resting [Ca]i (Dhara et al., 2014). Given this observation, it is tempting to speculate that variations in [Ca]i among the different preparations may contribute to the deviating expression of the complexin null phenotype in different preparations.

      Ad 2) The reviewer states: The authors use a Semliki Forest virus-based approach to express mutant proteins in chromaffin cells. This strategy leads to a strong protein overexpression (~7-8 fold, Figure 3 Suppl. 1). Therefore, experimental findings under these conditions may not necessarily be identical to findings with normal protein expression levels.

      As shown in Fig. 4, we use the secretion response of wt cells as a control so that we can assess the specificity and quality of the rescue approach in our experiments. In addition, the comparative analysis of the CpxII mutants was performed with respect to the equally overexpressed CpxII wt protein (Fig. 3 Suppl. 1), which we used as a control to determine the standard response under these conditions.

      Ad 3) The reviewer states: Measurements of delta Cm in response to Ca2+ uncaging by ramping [Ca2+ ] from resting levels up to several µM over a me period of several seconds were used to establish changes in the release rate vs [Ca2+ ]i relationship. It is not clear to this reviewer if and how concurrently occurring vesicle endocytosis together with a possibly Ca2+-dependent kinetics of endocytosis may affect these measurements.

      By infusing bovine chromaffin cells with 50µM free Ca2+, Smith and Betz have shown that the total capacitance increase is dominated by exocytosis and that significant endocytosis only sets in after 3 minutes (Smith and Betz, 1996, Nature, 380, 531). In the same line, we previously showed that mouse chromaffin cells (infused with 19µM free calcium over 2 minutes) responded with robust increase in membrane capacitance which strongly correlated with the number of simultaneously recorded amperometric events monitoring fusion of single vesicles (Dhara et al., 2014, Fig. 5B). Thus, capacitance alterations recorded under tonic intracellular Ca2+ increase in chromaffin cells are solely due to exocytosis and are not contaminated by significant endocytosis. As our Ca2+ ramp experiments were carried out for 6 seconds and the intracellular free [Ca]i did not exceed 19 µM the observed phenotypical differences between the experimental groups are most likely due to changes in exocytosis rather than endocytosis.

      Ad 4) The reviewer states: It should be pointed out that an altered "apparent Ca2+ affinity" or "apparent Ca2+ binding rate" does not necessarily reflect changes at Ca2+-binding sites (e.g. Syt1).

      We fully agree with the reviewer’s comment. As pointed out also in the response to reviewer 1, our experiments do not provide decisive evidence that the NTD of CpxII directly modulates the Ca2+ affinity of Syt1, an issue that we discuss on page 23 , line 523-529: ” The results favor a model wherein the CpxII NTD either directly regulates the biophysical properties of the Ca2+sensor by increasing the apparent forward rate of Ca2+-binding or indirectly affects SytI-SNARE or SytI-membrane interactions, thereby, lowering the energy barrier of Ca2+-triggered fusion.” 

      AD 5) There are alternative models on how Cplx may "clamp" vesicle fusion (see Bera et al. 2022, eLife) or how Cplx may achieve its regulation of transmitter release without mechanistically "clamping" fusion (Neher 2010, Neuron). Since the data presented here cannot rule out such alternative models (in this reviewer's opinion), the authors may want to mention and briefly discuss such alternative models.

      The study by Bara et al reiterates the model proposed by the Rothman group which attributes the clamping function of Cpx to its accessory alpha helix by hindering the progressive SNARE complex assembly. We have explicitly stated this issue in the original version of the manuscript (page 19, line 425) “As the accessory helix of Cpx has been found to bind to membrane proximal cytoplasmic regions of SNAP-25 and SybII (Malsam et al., 2012; Bykhovskaia et al., 2013; Vasin et al., 2016), an attractive scenario could be that both domains of CpxII, the CTD and the accessory helix, synergistically cooperate to stall final SNARE assembly”. In this context, we will now cite also the study by Bera et al.. 

      A related view of the function of complexin suggested that it may act as an allosteric adaptor for sytI (Neher 2010, Neuron). Here, rather than postulang independent "clamp" and "trigger" functions for the dual action of complexin, these were explained as facets of a simple allosteric mechanism by which complexin modulates the Ca2+ dependence of release. Yet, this interpretation appears to be difficult to reconcile with the observation of our and other laboratories, showing that the fusion-promoting and clamping effects are separable (e.g. Dhara et al., 2014; Lai et al., 2014; Makke et al., 2018; Bera et al., 2022).  

      Some parts of the Discussion are quite general and not specifically related to the results of the present study. The authors may want to consider shortening those parts.

      Considering the contrary findings in the field of SNARE-regulating proteins, the authors hope that the reviewer will agree that it is necessary to discuss the new observations in a broader context, as also acknowledged by the first reviewer.

      Last but not least, the presentation of the results could be improved to make the data more accessible to non-specialists, this concerns providing necessary background information, choice of colors, and labeling of diagrams.

      Done

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Regarding figures: 

      (1) Please use clearly distinct colors in diagrams. For example, in Figure 2 Suppl. 3, four different shades of red (or reddish) are used to color the traces and the respective bars. These different shades of red are difficult to discriminate. In Figure 5 Suppl. 1, the two greens are nearly indistinguishable.  

      Done

      (2) RRP size and SRP size on the one hand, and SR rate on the other represent different quantities which are measured in different units. Please use a separate y-axis for the SR (a rate measured in fF/s) and do not combine with RRP and SRP (pool sizes measured in fF). This would also automatically alleviate the need for axis breaks in the plots of RRP size and SRP size. In general, please do not use axis breaks which make interpretation of data unnecessarily more complicated.  

      In order to clarify the display, we now define the different units together with the quantified parameter (e.g. RRP [fF], SRP [fF], SR [fF/s]) allowing us to omit a second axis in those subpanels.

      (3) When plotting bar graphs showing mean tau_RRP, mean tau_SRP, and mean delay, please always use the correct y-axis labels, i.e. use "tau_RRP", "tau_SRP" and "delay" as y-axis labels as it was done for example in Figure 4D, and do not use "tau_RRP", "tau_SRP" and "delay" as x-axis labels as it was done for example in Figure 1D and many other figure panels.  

      We have standardized the figure display. Yet, we would prefer to keep our way subpanel labelling which states the parameter underneath the bar graph and thereby makes the results more accessible.  

      (4) Are the asterisks indicating statistical significance perhaps missing in Figure 4D, middle panel (tau_SRP)?

      There was not a statistically significant difference (wt vs cpxIIko+CpxII EA, P=0.0826, Kruskal-Wallis with Dunn’ post hoc test).  

      (5) According to the Results section (pages 12 to 13), I assume that in Figures 6 and 7 the labels "+Cplx XYZ" are used by the authors to identify an overexpression of Cplx XYZ in a Cplx WT background. The legend text reads however " ... cells expressing either Cplx2 wt or the mutant ...", which would not be correct. Please check.

      We have changed the formulations to “overexpression” accordingly.

      (6) The x-axis unit in Figure 8C is likely "µM" and not "M".

      Done.

      (7) The abbreviations "CplxII LL-EE" and "CplxII LL-WW", and "CplxII LLEE" and "CplxII LLWW" are very similar but refer to different mutants. Could you please think of a more specific and unambiguous abbreviation? Perhaps "CplxII L124E-L128E"?  

      We have changed the abbreviations, accordingly (i.e. CpxII L124E-L128E).  

      Regarding the manuscript text:  

      Line 65: "prevents" instead of "impairs"? 

      done

      Line 67: why "in vivo"? 

      We changed the formulation to ‘Several’

      Line 83: "in addition to the clamping function ..." This is misleading. Many of the studies listed here did not provide evidence for enhanced spontaneous release following Cplx loss and often observed the opposite, reduced spontaneous release. The enhanced delayed release was observed by Strenzke et al 2009 J.Neurosci. and by Chang et al. 2015 J.Neurosci. (which the authors may want to cite). However, that enhanced delayed release occurred despite reduced spontaneous release indicating that it is not simply the result of a missing "fusion clamp". 

      To accommodate the reviewer’s suggestion, we have changed the formulation to “Independent of the clamping function of Cpx….”

      Line 104: "speeds up exocytosis that is controlled by the forward rate of Ca2+ binding" This is difficult to understand without context.  

      We have now added the corresponding citations (Voets et al., 2001; Sorensen et al., 2003), which showed that exocytosis timing in chromaffin cells is largely determined by the kinetics of Ca2+-binding to SytI.

      Line 116: "Cplx2 knock out ..." Please provide (here or earlier in the manuscript) information to the reader about which Cplx paralogs are expressed in chromaffin cells.  

      We now state on line 111 that “CpxII is the only Cpx isoform expressed in chromaffin cells (Cai et al., 2008)”

      Line 118: "=~" either "=" or "~". 

      done

      Line 120: "instead" seems superfluous.

      done

      Line 272: "calcium binding rates" should perhaps better read "apparent calcium binding rates". 

      done

      Line 290: "enhancing SytI's Ca2+ affinity" should perhaps better be "enhancing the apparent Ca2+ affinity of the release machinery". Ca2+ binding kinetics is never directly assayed here.

      We agree and have phrased the sentence accordingly.

      Line 300: "Expression of Cplx ... in Syt1 R233Q ki cells, ..." Perhaps better "Overexpression of Cplx ... in Syt1 R233Q ki/Cplx2 wt cells, ..." for clarification?

      done

      Lines 313ff: What is assayed here is the apparent Ca2+ binding kinetics and apparent KD values of the release machinery. Ca2+ binding to Syt1 is never directly measured!  

      We agree and have changed the wording accordingly to “CpxII NTD supports the forward rate of calcium binding to SytI in accelerating exocytosis”

      Line 347: "Complexin plays a dual role ..." This is partially misleading. It does so in chromaffin cells and D.m. and C.e. NMJs but not at conventional mammalian synapses. 

      We agree and have changed the formulation to “In many secretory systems, Complexin plays a dual role in the regulation of SNARE-mediated vesicle fusion”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Introduction to the revised manuscript:

      We thank all three reviewers for their time and insightful comments on our original submission. We are submitting a substantially revised manuscript that includes several new experiments, analyses, discussion points, and clarifications that we believe address all of the main concerns of the reviewers.

      To address the request of Reviewers 2 and 3 to reinforce key findings in a more physiologically intact preparation, we performed recordings of YH-HET SST neurons in brain slices and found that these neurons show impairments in AP generation similar to those observed in YH-HET SST cultured neurons. These data are summarized in a new figure (Fig. 9). Along these lines, we performed additional recordings in cultured neurons at room temperature compared with physiological temperature and found that WT and YH-HET PV neuronal properties were similarly altered by temperature increases, suggesting that our YH variant-induced neuronal phenotypes are not temperature dependent. These data are shown in a new supplemental figure (Supplemental Fig. 4-3). To address concerns of Reviewer 1 regarding our KNa and NaP current recordings, we performed new experiments to further assess the specificity of the VU170 blocker in KNa KO neurons (summarized in Supplemental Fig. 5-2) and to better characterize the time course over which TTX blocks the persistent Na+ current and the KNa current (summarized in Supplemental Fig. 7-1). These latter two experiments provide further clarity and confidence in the accuracy of our measurements of both KNa and NaP currents. Lastly, to address the concern of Reviewer 3 regarding statistical analyses of the modeling data, we’ve added a new table with the results of a repeated measures ANOVA analysis (Supplemental Table 6), and two new figures illustrating the relative changes in each neuron group compared to their controls (Supplemental Figures 6-2 and 7-2). 

      In addition to the new experiments and analyses, we’ve added three new paragraphs to the Discussion section. As the hyperexcitability phenotype in YH-HET PV neurons is somewhat unexpected, we’ve added a paragraph comparing our findings with those found in PV neurons in another KCNT1 GOF model. We’ve also added a paragraph to speculate on the contribution of YH-HET variant-induced alterations in SST and PV neurons to network behavior and seizure propensity. Lastly, we’ve added a paragraph to include the additional limitations and caveats of our study requested by the reviewers.  

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the effects of a heterozygous mutation in the KCNT1 potassium channels on the properties of ion currents and the firing behavior of excitatory and inhibitory neurons in the cortex of mice expressing KCNT1-Y777H. In humans, this mutation as well as multiple other heterozygotic mutations produce very severe early-onset seizures and produce a major disruption of all intellectual function. In contrast, in mice, this heterozygous mutation appears to have no behavioral phenotype or any increased propensity to seizures.

      Regarding the last sentence above, we wanted to clarify a point that we neglected to emphasize in the initial submission. In the Results section from our previous paper (Shore et al., 2020), we failed to observe seizures in 14 heterozygous mice, whereas 23/25 homozygous mice showed seizures by video-EEG. However, in the fifth paragraph of the Discussion section from that paper, we further stated that “during the preparation and review of [that] article, we observed seizures in two Kcnt1-Y777H heterozygous mice, one during a widefield Ca2+ imaging experiment and the other during a video-EEG experiment”. Thus, we concluded that “heterozygous expression can result in seizures in a rodent model, but apparently at a much lower frequency than that observed with homozygous expression”. To emphasize these findings, we’ve added a sentence to the Introduction in this manuscript about the occurrence of infrequent seizures in Kcnt1-Y777H heterozygous mice, along with a reference to the Discussion of our previous paper.

      A relevant phenotype is, however, evident in mice with the homozygous mutation, and the authors have previously published the results of similar experiments with the homozygotes. As perhaps expected, the neuronal effects of the heterozygous mutation presented in this manuscript are generally similar but markedly smaller than the previously published findings on homozygotes. There are, however, some interesting differences, particularly on PV+ interneurons, which appear to be more excitable than wild type in the heterozygotes but more excitable in the heterozygotes. This raises the interesting question (which could be more explicitly discussed by the authors) as to whether the reported changes represent homeostatic events that suppress the seizure phenotype in the mouse heterozygotes or simply changes in excitability that do not reach the threshold for behavioral outcomes.

      That is an interesting question. We have added a new paragraph to the Discussion speculating about whether the alterations in SST and PV excitability suppress seizures or do not reach the threshold for behavioral outcomes. This seems to be requested by the second reviewer as well in Weaknesses point #2.

      Strengths and Weaknesses:

      (1) The authors find that the heterozygous mutation in PV+ interneurons increases their excitability, a result that is opposite from their previous observation in neurons with the corresponding homozygous mutation.

      We would like to provide a minor clarification to the above statement that, in this manuscript, we show that “the heterozygous mutation in PV+ interneurons increases their excitability, a result that is opposite from their previous observation in neurons with the corresponding homozygous mutation”. In our previous manuscript, we assessed YH-HOM phenotypes in NFS and FS GABAergic neurons, but did not specifically mark PV neurons. Although the YH-HOM FS neurons showed an increase in rheobase and a decrease in AP firing, the magnitudes of these effects were far less than those observed in the NFS population. More importantly, the FS GABAergic population likely consists of PV- and SST-expressing neurons; thus, we can not directly compare the results from the NFS and FS groups to the PV and SST groups, respectively (please see our response to Weaknesses point #3, Reviewer #2). We apologize for the confusion.

      They propose that this results from the selective upregulation of a persistent sodium current INaP in the PV+ interneurons. While the observations are very interesting, there are three issues concerning this interpretation that should be addressed:

      A) The protocol for measuring the INaP current could potentially lead to results that could be (mis)interpreted in different ways in different cells. First, neither K currents nor Ca currents are blocked in these experiments. Instead, TTX is applied to the cells relatively rapidly (within 1 second) and the ramp protocol is applied immediately thereafter. It is stated that, at this time, Na currents and INaP are fully blocked but that any effects on Na-activated K currents are minimal. In theory, this would allow the pre- to post-difference current to represent a relatively uncontaminated INaP. This would, however, only work if activation of KNa currents following Na entry is very slow, taking many seconds. A good deal of literature has suggested that the kinetics of activation of KNa currents by Na influx vary substantially between cell types, such that single action potentials and single excitatory synaptic events rapidly evoke KNa currents in some cell types. This is, of course, much faster than the time of TTX application. Most importantly, the kinetics of KNa activation may be different in different neuronal types, which would lead to errors that could produce different estimates of INaP in PV+ interneurons vs other cell types.

      First, we’d like to point out that we did not want to block K+ currents (which would also block KNa) when measuring INaP for these experiments, because our hypothesis was that the increased KNa current in YH-HET PV neurons was somehow causing an increase in INaP, and it is possible that this increase depends on an intact KNa. Thus, we decided to use a method based on the observation in our experiments, and previously made by others (Budelli et al., 2009), that the reduction of outward current after TTX addition is slow relative to the rapid reduction in Na+ current. We understand and agree with the reviewer that, if KNa currents were blocked more quickly by TTX in some neuron types than others, then our estimate of INaP using this method would be contaminated in these neuron types, which would lead to inaccurate measurements. To assess this possibility among the main neuron types used in this study, we performed new experiments in which we monitored the time course of INaP block and subsequent IKNa loss following TTX application in PV and SST neurons during slow voltage ramps. We note that action potentials are not present in the slow voltage ramps due to inactivation of the transient Na+ current. These new experiments show that, in SST and PV (both WT and Het) neurons, the block of INaP is nearly complete at the 6s time point, whereas the decay in IKNa is far slower (V50 of ≈ 25s), and importantly, these results do not differ substantially by cell type or genotype. These data suggest that our measurements of INaP are not significantly contaminated by IKNa, and that this method allows for the effective separation of these two currents. These data have been added as a supplemental figure (Supplemental Fig. 7-1) and are briefly described and referenced in the Results section.

      B) As the authors recognize, INaP current provides a major source of cytoplasmic sodium ions for the activation. An expected outcome of increased INaP is, therefore, further activation of KNa currents, rather than a compensatory increase in an inward current that counteracts the increase in KNa currents, as is suggested in the discussion.

      We agree that the increase in INaP could theoretically further increase IKNa, as veratridine was previously shown to increase IKNa (Hage & Salkoff, 2012). However, we do not believe that this would necessarily be the case, because as the reviewer notes in their next comment, there is insufficient information on the relative locations of the INaP and KCNT1 channels, as well as the kinetics of sodium transfer to KCNT1 channels, and even less is known in the context of KCNT GOF neurons. Thus, there are a couple of plausible reasons that increased INaP may not alter KNa currents in YH-HET PV neurons: (1) In YH-HET PV neurons, the particular sodium channels that are responsible for the increased INaP may not be located within close proximity to the KCNT1 channels. (2) Homeostatic mechanisms that alter the AIS length, or move the AIS further from the soma, in response to altered neuronal excitability are well described (Grubb & Burrone, 2010; Kuba et al., 2010); thus, it is possible that in YH-HET PV neurons, the length or location of the AIS is altered, leading to uncoupling of the sodium channels that are responsible for the increased INaP to the KCNT1 channels.

      C) Numerical simulations, in general, provide a very useful way to evaluate the significance of experimental findings. Nevertheless, while the in-silico modeling suggests that increases in INaP can increase firing rate in models of PV+ neurons, there is as yet insufficient information on the relative locations of the INaP channels and the kinetics of sodium transfer to KNa channels to evaluate the validity of this specific model.

      We completely agree; thus, we have described each of these limitations in the Discussion. We state that the model neurons may “lack more detailed features of ion channels, such as post-translational modifications and subcellular localizations”, and that our KCNT1 model conductance is “hampered by an incomplete understanding of the relationship between Na+ influx, membrane voltage, and channel gating in neurons”.  

      (2) The greatest effect of TTX application would be expected to be the elimination of large transient inward sodium currents. Why are no such currents visible in the control (pre-TTX) or the difference currents (Fig. 2)? Is it possible I missed something in the methods?

      We apologize for the confusion and our mistake in failing to mention this important feature of the displayed traces. To include all of the representative traces in the figures, and prevent overlap of the traces, we removed the large inward sodium currents using the masking tool in Adobe Illustrator in Figure 2 and Supplemental Figure 5-1. We have added that information to the relevant figure legends. We have also provided unmasked images of the representative traces from Figure 2 and Supplemental Figure 5-1 to illustrate the large transient inward sodium currents, and the significant reduction of these currents with TTX treatment.

      (3) As expected, the changes in many of the measured parameters are smaller in the present study with heterozygotes than those previously reported for the homozygous mutation. Some of the statements on the significance of some of the present findings need to be stated more clearly. For example, in the results section describing Fig. 2, it is stated that "In glutamatergic and NFS GABAergic YH-HET neurons, the overall KNa current was increased ...as measured by a significant effect of genotype ...." Later in the same paragraph it is stated that the increases in KNa current are not significant. Apparently, different tests lead to different conclusions. Both for the purpose of understanding the pathophysiological effects of changes in KNa current and for making further numerical simulations, more explicit clarifying statements should be made.

      We apologize for the confusion on the description of these statistics. The results come from the same test, which is a Generalized Linear Mixed Model (GLMM). The factors in our GLMM were voltage step, genotype, and a voltage step x genotype interaction term. The overall effect of genotype is significant in glutamatergic neurons, but pairwise tests at each voltage step show no significant effect of genotype at any given voltage. This is somewhat analogous to running a traditional ANOVA on multiple groups and finding a significant ANOVA p-value but no significant post-hoc multiple comparisons tests, and is not uncommon. Our interpretation of this is that heterozygous expression of the YH variant in glutamatergic neurons likely increases KNa currents across positive potentials (as was seen with the YH-HOM glutamatergic neurons), but only a small amount at each positive step; thus, we lack the statistical power to determine any particular voltage step where this occurs.

      (4) The effects of the KCNT1 channel blocker VU170 on potassium currents are somewhat larger and different from those of TTX, suggesting that additional sources of sodium may contribute to activating KCNT1, as suggested by the authors. Because VU170 is, however, a novel pharmacological agent, it may be appropriate to make more careful statements on this. While the original published description of this compound reported no effect on a variety of other channels, there are many that were not tested, including Na and cation channels that are known to activate KCNT1, raising the possibility of off-target effects.

      We agree and thank the reviewer for making this point. To address this question, we measured KNa currents in WT vs. Kcnt1/Kcnt2-dKO neurons using VU170 to illustrate the extent of outward current due to off-target effects of the drug. These data have been included as a supplemental figure (Supplemental Fig. 5-2). We have also added several sentences to the Results section referencing this figure. Interestingly, in Kcnt1/Kcnt2-dKO neurons, VU170 seems to be quite specific across the negative potentials, as no outward currents are apparent until approximately -10 mV onward, whereas across positive potentials, there is a VU170-senstive outward current reaching ~1 nA by +50 mV. We have also included a note of caution in interpreting these data and added the possibility of off-target effects of VU170 as an alternative explanation for the differences observed on KNa currents between TTX and VU170 to the Discussion section.

      (5) The experiments were carried out at room temperature. Is it possible that different effects on firing patterns in heterozygotes and homozygotes would be observed at more physiological temperatures?

      Yes, it is reasonable to assume that an increased temperature would affect neuronal firing patterns in cultured neurons, as temperature differences have been shown to alter synaptic transmission and neuronal function, as assessed in both cultured neuron and slice recordings. All of our recordings were performed at room temperature in this study, and although they are valid with regard to between-group comparisons, this additional caveat is worth mentioning. We have added this to the paragraph describing study limitations in the Discussion section.

      To better understand the effects of temperature in our recordings, we have now compared membrane and AP generation parameters at room temperature (~22°C) and at a more physiological temperature (35°C) in a before-after study of 16 WT neurons, including both glutamatergic and GABAergic neuron types. Not surprisingly, we found robust alterations in all parameters assessed, excluding resting membrane potential and capacitance. We further assessed the effect of temperature on WT and YH-HET PV neurons, as the PV neurons expressing the YH variant showed the most unexpected phenotypes in our study. In our room temperature recordings, we showed that the YH-HET variant decreased the rheobase current, increased the AP amplitude, and increased the AP firing. In our before-after comparison (22°C-35°C) of PV neurons (WT; n=11, YH-Het; n=10), the WT and YH-HET neurons showed the same temperature-dependent effects on these parameters, including increased rheobase, decreased AP amplitude, and a higher maximal firing rate, at 35°C compared to those at 22°C. These data have been added to the manuscript as a supplemental figure (Supplemental Fig. 4-3) and are briefly referenced and described in the Results section.     

      Moreover, in our original manuscript, we showed that the effects of the homozygous YH variant on glutamatergic and NFS GABAergic neuron excitability were highly similar between cultured recordings at room temperature (~22°C) and slice recordings at 32°C. Taken together, these data suggest that the reported neurophysiological phenotypes downstream of the YH variant are likely not temperature dependent. 

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shore et al. investigate the consequent changes in excitability and synaptic efficacy of diverse neuronal populations in an animal model of juvenile epilepsy. Using electrophysiological patch-clamp recordings from dissociated neuronal cultures, the authors find diverging changes in two major populations of inhibitory cell types, namely somatostatin (SST)- and parvalbumin (PV)-positive interneurons, in mice expressing a variant of the KCNT1 potassium channel. They further suggest that the differential effects are due to a compensatory increase in the persistent sodium current in PV interneurons in pharmacological and in silico experiments.

      Strengths:

      (1) Heterozygous KCNT1 gain of function variant was used which more accurately models the human disorder.

      (2) The manuscript is clearly written, and the flow is easy to follow. The authors explicitly state the similarities and differences between the current findings and the previously published results in the homozygous KCNT1 gain of function variant.

      (3) This study uses a variety of approaches including patch clamp recording, in silico modeling, and pharmacology that together make the claims stronger.

      (4) Pharmacological experiments are fraught with off-target effects and thus it bolsters the authors' claims when multiple channel blockers (TTX and VU170) are used to reconstruct the sodium-activated potassium current. Having said that, it would be helpful to see the two drug manipulations used in the same experiment. Notably, does the more selective blocker VU170 mimic the results of TTX for NFS GABAergic cells in Figure 2? And does it unmask a genotype difference for FS GABAergic cells like the one seen in PV interneurons in Figure 5C3.

      To illustrate the two drug manipulations in the same experiment, we recorded from WT SST and PV neurons (5 neurons/group) and blocked KNa currents first using TTX and then VU170, following wash out between the two drugs, in the same neurons. Below, we have plotted the points at each voltage step for each SST and PV neuron, and for each drug treatment, on the same graph to show how they vary directly. At each voltage step, lines connect the points representing the TTX-sensitive and VU-sensitive currents for each neuron to show the individual effects (left-most graphs). Summary data are shown across all voltages (middle graphs) and across negative voltages (right-most graphs).

      Author response image 1.

      We have not used VU170 on FS and NFS populations of GABAergic neurons. However, for reasons that are explained more extensively below in response to Weaknesses #3, we would not predict KNa currents recorded from SST- and PV-GABAergic neurons to mimic those of NFS- and FS-GABAergic neurons, respectively.

      Weaknesses:

      (1) This study relies on recordings in dissociated cortical neurons. Although specific WT interneurons showed intrinsic membrane properties like those reported for acute brain slices, it is unclear whether the same will be true for those cells expressing KCNT1 variants. This reviewer highly recommends confirming some of the key findings using an ex vivo slice preparation. This is especially important given the discrepant result of reduced excitability of PV cells reported by Gertler et al., 2022 (cited here in the manuscript but not discussed in this context) in acute hippocampal slices for a different KCNT1 gain of function variant.

      We thank the reviewer for this suggestion. To test whether SST-expressing YH-HET neurons show similar impairments to those observed in culture, we crossed the FVB-Tg(GadGFP)45704Swn/J transgenic mouse line (Jackson Labs #003718), also known as the GIN line, to the Kcnt1-YH line. Mice from the GIN line express eGFP in a subpopulation of SST-expressing neurons in the hippocampus and cortex. We performed slice recordings of cortical layer 2/3, GFP-expressing neurons from P21-30, WT and YH-HET GIN mice. Although the input resistance was not significantly decreased, the rheobase was higher in the YH-HET neurons, and they fired fewer APs across increasing current steps, than WT neurons, supporting the main findings from the SST-expressing neurons in culture. These data have been added to the manuscript in a new figure (Fig. 9).

      Regarding the previously published results on the effect of KCNT1 GOF on PV neuron excitability by Gertler et al., we have written a new paragraph in the Discussion section (last paragraph of the section, “Neuron-type-dependent KCNT1 GOF effects”) that discusses the differences between the findings by Gertler et al. and the current study. 

      To further investigate the effects of heterozygous YH variant expression on SST- vs. PV-expressing neuron excitability in ex vivo slice recordings, we are now crossing a cre-inducible, Td-Tomato Red reporter line (Ai9) to the Kcnt1-YH line. After obtaining Ai9Tg/Tg; Kcnt1m/+ mice, we will cross these to Sst-Cre and Pvalb-Cre lines to be able to record from marked SST and PV, WT and YH-HET neurons in slice. We plan on submitting results from these recordings as an eLife Research Advances article linked to this article.

      (2) It is unclear how different pieces of results fit together to form a story about the disease pathophysiology.

      We have added a paragraph to the Discussion to speculate on how these various GABAergic subtype-specific effects downstream of the YH variant may contribute to overall network/brain pathology and seizure propensity in heterozygous mice.

      For example, hyperexcitability of PV cells would suggest more inhibition which would counter seizure propensity. However, spontaneous inhibitory postsynaptic currents show no change in pyramidal neurons. Moreover, how do the authors reconcile that the reductions in synaptic inputs onto interneurons in Figure 3B with the increases in Figure 8? This should be discussed.

      Generally, network and synaptic alterations downstream of the heterozygous variant were quite minimal compared with those of the homozygous variant. Although there were reductions in the frequency of synaptic inputs onto inhibitory neurons, the changes were relatively small. Thus, we concluded that the neuronal effects downstream of the heterozygous YH variant were below some threshold to result in broader network effects on synaptic activity and connectivity similar to those in the homozygous YH model. The discrepancies between our GABAergic vs. FS/NFS vs. VIP/SST/PV data will be discussed in more detail in response to Weakness #3.   

      (3) Similarly, the results in this work are not entirely internally consistent. For example, given the good correspondence between FS and NFS GABAergic cells with PV and SST expression, why are FS GABAergic cells hyperexcitable in Figure 1? If anything, there is a tendency to show reduced excitability like the NFS GABAergic cells.

      In our neuron cultures, 76-80% of Neu-N-expressing neurons are GFP+ (from the CamKII-eGFP virus used to mark glutamatergic neurons), and of the remaining ~20-24%, the majority are GABAergic (verified using the Dlx5/6-mRuby virus to mark GABAergic neurons and using electrophysiology to assess AP parameters and analyze evoked responses). In our original experiments, recordings sampled from this larger GABAergic population were used (Fig. 3), or this population was sorted almost equally into FS and NFS (Figs. 1 and 2).

      In later experiments, we isolated and cultured neurons from VIP-Cre, SST-Cre, and PV-Cre mouse lines and marked these neuron types in vitro with a Cre-inducible mCherry virus. In the VIP-Cre cultures, ~6% of the GFP- population, or 1.2% of the Neu-N-population, was mCherry+. In the SST-Cre cultures, ~20.5% of the GFP- population, or 4.7% of the Neu-N-population, was mCherry+. In the PV-Cre cultures, less than 1% of the Neu-N-population was mCherry+, which is not surprising considering the relatively late onset of PV expression compared with those of VIP and SST. Thus, we would estimate that we are marking and recording from less than 30% of the total GABAergic population in these in vitro experiments, rather than the 80-90% that these three populations would sum to in vivo.  

      Furthermore, using our original criteria for sorting GABAergic neurons into FS and NFS subtypes, all VIP recorded neurons were of the NFS type, PV of the FS type, whereas SST were of the FS (38%) and NFS (62%) types, which is not far off from the significant fraction of SST neurons that have been shown to be fast-spiking in slice experiments (Kvitsiani et al., 2013; Urban-Ciecko & Barth, 2016). Therefore, the FS group consists of both PV and SST neurons, and the NFS group consists of both VIP and SST neurons, and likely also contains immature PV neurons that have not yet developed a fast-spiking phenotype. Taken together, this suggests that the data from these two sets of experiments (FS/NFS vs. VIP/SST/PV) are not directly comparable.

      Also, why do the WT I-V curves look so different between Figures 2 and 5? This reviewer suggests at least a brief explanation in the discussion.

      As to the differences in appearance between the WT I-V curves in Figures 2 and 5, those plots are from different neuron types (Fig. 2: Glutamatergic, FS GABAergic, and NFS GABAergic vs. Fig. 5: VIP-, SST-, and PV-expressing), and the KNa currents are isolated using different methods (Fig. 2: TTX-subtraction vs. Fig. 5: VU170-subtraction). TTX blocks an inward Na+ current, which is apparent across subthreshold voltages in Fig. 2C1-3, whereas VU170 does not block this current, making it not apparent in Fig. 5C1-3. Also, the bottom three panels in Fig. 2C1-3 show the KNa current from -80 to 0 mV, whereas those in Fig. 5C1-3 show from -80 to -30 mV, to better illustrate the areas spanning KNa current increases, so their appearance is not directly comparable.

      (4) Given the authors' claim that the KCNT1 activation curve is a major contributor to the observed excitability differences in specific GABA cell subtypes, it would be helpful to directly measure the activation curve in the variants experimentally as was done for WT KCNT1 in Figure 6A and use the derived kinetics in the compartmental model.

      We apologize for the confusion. Although the activation curves among different GABAergic subtypes from WT KCNT1 are distinct, and we believe that these varying kinetics contribute to the neuron-type-specific phenotypes of KCNT1 GOF, we didn’t intend to suggest that the heterozygous Y777H variant itself causes neuron-type-specific alterations to the activation curves of the GABAergic subtypes. To clarify this point, below, we show the high similarity of the activation curves between WT KCNT1 and YH-HET KCNT1 in each of the GABAergic subtypes.

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary:

      The present manuscript by Shore et al. entitled Reduced GABAergic Neuron Excitability, Altered Synaptic Connectivity, and Seizures in a KCNT1 Gain-of-Function Mouse Model of Childhood Epilepsy" describes in vitro and in silico results obtained in cortical neurons from mice carrying the KCNT1-Y777H gain-of-function (GOF) variant in the KCNT1 gene encoding for a subunit of the Na+-activated K+ (KNa) channel. This variant corresponds to the human Y796H variant found in a family with Autosomal Dominant Nocturnal Frontal lobe epilepsy. The occurrence of GOF variants in potassium channel encoding genes is well known, and among potential pathophysiological mechanisms, impaired inhibition has been documented as responsible for KCNT1-related DEEs. Therefore, building on a previous study by the same group performed in homozygous KI animals, and considering that the largest majority of pathogenic KCNT1 variants in humans occur in heterozygosis, the Authors have investigated the effects of heterozygous Kcnt1-Y777H expression on KNa currents and neuronal physiology among cortical glutamatergic and the 3 main classes of GABAergic neurons, namely those expressing vasoactive intestinal polypeptide (VIP), somatostatin (SST), and parvalbumin (PV), crossing KCNT1-Y777H mice with PV-, SST- and PV-cre mouse lines, and recording from GABAergic neurons identified by their expression of mCherry (but negative for GFP used to mark excitatory neurons).

      The results obtained revealed heterogeneous effects of the variant on KNa and action potential firing rates in distinct neuronal subpopulations, ranging from no change (glutamatergic and VIP GABAergic) to decreased excitability (SST GABAergic) to increased excitability (PV GABAergic). In particular, modelling and in vitro data revealed that an increase in persistent Na current occurring in PV neurons was sufficient to overcome the effects of KCNT1 GOF and cause an overall increase in AP generation.

      Strengths:

      The paper is very well written, the results clearly presented and interpreted, and the discussion focuses on the most relevant points.

      The recordings performed in distinct neuronal subpopulations are a clear strength of the paper. The finding that the same variant can cause opposite effects and trigger specific homeostatic mechanisms in distinct neuronal populations is very relevant for the field, as it narrows the existing gap between experimental models and clinical evidence.

      Weaknesses:

      My main concern is in the epileptic phenotype of the heterozygous mice investigated. In fact, in their previous paper the Authors state that "...Kcnt1-Y777H heterozygous mice did not exhibit any detectable epileptiform activity" (first sentence on page 4). However, in the present manuscript, they indicate twice in the discussion section that these mice exhibit "infrequent seizures". This relevant difference needs to be clarified to correctly attribute to the novel pathophysiological mechanism a role in seizure occurrence. Were such infrequent seizures clearly identified on the EEG, or were behavioral seizures? Could the authors quantify this "infrequent" value? This is crucial also to place in the proper perspective the Discussion statement regarding "... the increased INaP contribution to ... network hyperexcitability and seizures".

      We apologize for the confusion. Indeed, in the Results section from our previous paper, we failed to observe seizures in 14 heterozygous mice, whereas 23/25 homozygous mice showed seizures by video-EEG. However, in the fifth paragraph of the Discussion section from that paper, we further stated that “during the preparation and review of [that] article, we observed seizures in two Kcnt1-Y777H heterozygous mice, one during a widefield Ca2+ imaging experiment and the other during a video-EEG experiment”. Thus, we concluded that “heterozygous expression can result in seizures in a rodent model, but apparently at a much lower frequency than that observed with homozygous expression”. To emphasize these findings, we’ve added a sentence to the Introduction in this manuscript about the occurrence of infrequent seizures in Kcnt1-Y777H heterozygous mice, along with a reference to the Discussion of our previous paper.

      Of the two observed seizures, one seizure was captured in the Weston Lab at the University of Vermont from a Kcnt1-Y777H heterozygous mouse expressing a calcium indicator (after it was bred to the Snap25-GCaMP6s line) during a Ca2+ widefield imaging experiment, and it was accompanied by a time-locked video of the seizure event. The other seizure was recorded as a control during a drug study using video-EEG. This Kcnt1-Y777H heterozygous mouse had multiple tonic seizures, as evidenced by EEG traces and the accompanying video, which were recorded and analyzed in the Frankel Lab at Columbia University. The seizures from heterozygous mice have not been officially quantified, as they have only been rarely observed across multiple different experiments using heterozygous mice at multiple institutions, making quantification quite difficult.

      Lastly, regarding attributing the role of the identified pathological mechanisms to seizure occurrence mentioned by the reviewer, we have added a paragraph to the Discussion to speculate on how the various GABAergic subtype-specific effects downstream of the YH variant may contribute to the general lack of network/brain pathology and seizure generation in heterozygous mice.  

      Also, some statistical analysis seems to be missing. For example, I could not find any for the data shown in Fig. 6. Thus, the following statement: "the model PV neurons responded to KCNT1 GOF with decreased AP firing and an increased rheobase" requires proper statistical evaluation.

      We thank the reviewer for this suggestion. We were initially hesitant to apply a formal statistical analysis to the modeling data because it differs in important ways from the experimental data. However, we have now provided statistical analyses of these data, with some caveats. Because we applied each KCNT1 GOF level (40, 35, and 30 mM) to the same set of neurons for each data set, we performed repeated measures ANOVA analyses to assess differences due to GOF in each subtype. We note that some changes are statistically significant, but may not be physiologically relevant. For example, there are changes in input resistance and rheobase in VIP neurons only at the higher GOF level (30 mM), but the magnitude of each change is quite small relative to those in SST neurons (Rin: 1.7 MΩ in VIP vs. 23.2 MΩ in SST, rheo: 1.7 pA in VIP vs. 52.5 pA in SST), and likely as a consequence, there are no downstream effects on the AP firing rate at either GOF level in VIP neurons. It is important to examine the magnitude of the effects and interpret them in the context of the changes in other neuron types and in the experimental data, thus, we’ve provided two new figures to better illustrate the relative changes in each neuron type (Supplemental Figures 6-2 and 7-2). We have also added these statistical results to Figures 6E2, 6F2, 6G2, and 7E, and Supplemental Fig. 6-1, and we have described them in the Results section. A summary of the statistics has also been added in Supplemental Table 6.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In addition to addressing the weaknesses highlighted in the public review, this reviewer recommends using a KCNT1 agonist such as loxapine to see if activating the potassium channel mimics the KCNT1 GOF in SST and PV cells.

      Although we appreciate this suggestion, we’re not sure whether treating GABAergic subtypes with loxapine would provide much clarity in the absence of many supporting experiments. First, the amount of channel activation and any changes in kinetics caused by loxapine would need to be measured and compared to the YH-HET GOF effects in order to interpret the results. In addition, the aforementioned caveat about off-target effects of small molecules would also have to be considered, as loxapine inhibits many other channels at nanomolar concentrations.

      More importantly, we hypothesize that several of the GABAergic subtype-specific effects of KCNT1 GOF result from homeostatic or adaptive mechanisms due to long-term increases in KNa currents. For instance, PV-expressing YH-HET neurons had a lower rheobase, increased AP amplitude, and increased AP firing frequency, effects that we believe are due, not to increased KNa currents themselves, but to a compensatory increase in a persistent Na+ current. For the SST neurons, we hypothesize that the increased capacitance and soma size, together with the increased electrical coupling, exacerbate the hypoexcitability phenotype downstream of the YH variant. Thus, we would not necessarily expect that opening KCNT1 channels by acute loxapine treatment would mimic many of these effects.

      Indeed, in a previous study using a different KCNT1 GOF mouse model, loxapine treatment mimics KCNT1 GOF effects in some neuron types (reduced AP firing frequency in loxapine-treated, WT PV neurons mimics that observed in heterozygous KCNT1 GOF PV neurons), but not in others (reduced AP firing frequency in loxapine-treated, WT pyramidal neurons does not mimic the unaltered AP firing frequency observed in heterozygous and homozygous KCNT1 GOF pyramidal neurons) (Gertler et al., 2022).  

      Related to this suggestion by the reviewer, we are currently performing studies using a KCNT1 blocker in WT and Kcnt1-KO neurons to better understand the role of KCNT1 among cortical neuronal subtypes that will be published in a future manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Though I realize that primary cultures allow for efficient identification of neuronal subclasses, it would have been useful to show that similar changes also occur in neurons with conserved in vivo connectivity, such as those recorded from brain slices.

      We thank the reviewer for this suggestion. We have added an additional figure (Fig. 9) showing that the hypoexcitability phenotype observed in SST neurons in culture recordings is conserved in SST neurons in slice recordings from GIN mice, which express GFP predominately in SST-expressing neurons.

      In addition, further experiments in PV neurons from Kcnt1-Y777H homozygous mice would provide evidence for a gene-dosage role in the changes found in heteros.

      For this manuscript, we chose to focus our efforts on understanding the effects of heterozygous Kcnt1 variant expression in various neuronal subtypes with the goal of better modeling GOF variant effects in human disease. However, we’re very interested in investigating the effects of homozygous expression of the YH variant on each of the GABAergic subtypes to compare with those found in this study, but this requires more rounds of breeding to get homozygous mice with GABAergic subtype-specific expression of cre recombinase. We look forward to reporting the results from these experiments in a future manuscript.

      Also, when addressing the issue regarding the different effects of the same GOF variant on the excitability of distinct neuronal populations in the Discussion or Introduction sections, the authors may want to cite the recent work on KCNQ2 and KCNQ3 by the Tzingounis group (https://pubmed.ncbi.nlm.nih.gov/37607817/).

      We thank the reviewer for bringing this manuscript to our attention. We have added this citation to a new paragraph in the Discussion section regarding neuron-type specific effects of ion channel variants (the last paragraph focusing on the effects in PV neurons).

      Budelli, G., Hage, T. A., Wei, A., Rojas, P., Jong, Y. J., O'Malley, K., & Salkoff, L. (2009). Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci, 12(6), 745-750. https://doi.org/10.1038/nn.2313

      Gertler, T. S., Cherian, S., DeKeyser, J. M., Kearney, J. A., & George, A. L., Jr. (2022). K(Na)1.1 gain-of-function preferentially dampens excitability of murine parvalbumin-positive interneurons. Neurobiol Dis, 168, 105713. https://doi.org/10.1016/j.nbd.2022.105713

      Grubb, M. S., & Burrone, J. (2010). Activity-dependent relocation of the axon initial segment fine-tunes neuronal excitability. Nature, 465(7301), 1070-1074. https://doi.org/10.1038/nature09160

      Hage, T. A., & Salkoff, L. (2012). Sodium-activated potassium channels are functionally coupled to persistent sodium currents. J Neurosci, 32(8), 2714-2721. https://doi.org/10.1523/JNEUROSCI.5088-11.2012

      Kuba, H., Oichi, Y., & Ohmori, H. (2010). Presynaptic activity regulates Na(+) channel distribution at the axon initial segment. Nature, 465(7301), 1075-1078. https://doi.org/10.1038/nature09087

      Kvitsiani, D., Ranade, S., Hangya, B., Taniguchi, H., Huang, J. Z., & Kepecs, A. (2013). Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature, 498(7454), 363-366. https://doi.org/10.1038/nature12176

      Shore, A. N., Colombo, S., Tobin, W. F., Petri, S., Cullen, E. R., Dominguez, S., Bostick, C. D., Beaumont, M. A., Williams, D., Khodagholy, D., Yang, M., Lutz, C. M., Peng, Y., Gelinas, J. N., Goldstein, D. B., Boland, M. J., Frankel, W. N., & Weston, M. C. (2020). Reduced GABAergic neuron excitability, altered synaptic connectivity, and seizures in a KCNT1 gain-of-function mouse model of childhood epilepsy. Cell Rep.

      Urban-Ciecko, J., & Barth, A. L. (2016). Somatostatin-expressing neurons in cortical networks. Nat Rev Neurosci, 17(7), 401-409. https://doi.org/10.1038/nrn.2016.53

    1. Author response:

      We thank the reviewers for their constructive comments that will help us clarify and strengthen the paper. We will be happy to address all the comments and adjust the text accordingly. Regarding the suggestion in the assessment to include a “more thorough comparison with with human behavior”, we believe this comment reflects one of the reviewer’s comments to compare with order effects (primacy and recency); we did not see any other comments that would reflect this (our existing simulations do make contact with other human behavior regarding error distributions, including probability of recall, precision, sensitivity to reinforcement history, and dopamine manipulation effects on human WM). We thank the reviewers for this comment and we will conduct the appropriate simulations and analysis to compare with sequential effects in working memory.

    1. Author response:

      Reviewer #1 (Recommendations For The Authors): 

      This paper represents a huge amount of work on a condition whose patients' health and well-being have not always been prioritized, and only relatively recently has the immune dysregulation seen in patients with Down Syndrome (DS) been garnering major research interest. 

      This paper provides an unparalleled examination of immune disorder in patients with DS. In a truly herculean effort, the authors provided the cumulative examination of over 440 patients with DS, confirmed the alterations in immune cell subsets (n=292, 96 controls) and multi-organ autoimmunity seen in these patients as they age, and identified autoantibody production that could contribute to conditions co-occurring in patients with DS. They also sought to look at whether the early immunosenescence seen in DS was due to the inflammatory profile by comparing age-associated markers in DS patients and euploid controls separately, finding that several markers are regulated with age regardless of group, while comparing the effect of age versus DS status on cytokine status identified inflammatory markers elevated in DS patients across the lifespan that do not increase with age or that increase with age only in the DS cohort. This is very interesting in the context of DS in particular, and immunity during aging in general. 

      The second part of the manuscript presents the results from a clinical trial with the JAK inhibitor tofacitinib in DS patients. While the number of DS patients treated with tofacitinib was small, the results were often quite striking. Treatment was well-tolerated and the improvement of dermatological conditions was clear. The less responsive patients AA4 and AA2 provide a very clear illustration that these patients are sensitive to immune triggers during treatment. Additionally, the demonstration that patients' IFN scores and cytokine levels decreased without clear immunosuppression with tofacitinib treatment is encouraging, since treatment with this drug would need to be continuous. I would be curious to see if the patients added past the cutoff for interim analysis follow a similar trajectory. I would not ask the authors to add any data; the paper is well-written and logically constructed. 

      I only have a small comment: I really did not like how Figure 2 a, d, and g tethered the coloring to the magnitude of fold change to show the effect of DS particularly for 2a and 2g. Given that these fold changes are quite modest, the coloring is very light and hard to distinguish. The clear takeaway is that the effect on T cells is greatest, but there must be a better way to illustrate this. Perhaps displaying this graph on a non-white background could help with contrast. 

      We are grateful for the Reviewer’s very positive assessment of the manuscript and constructive feedback. We want to assure the Reviewer that similar analyses will be completed in the future for the entire cohort recruited into the trial to determine if similar trajectories and results are observed with the larger sample size. Additionally, following Reviewer’s guidance, we will explore alternative ways to present the data in Figure 2 for greater clarity in a revised version of the manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      • Although the focus of the patients in the first part of the paper is on autoimmune/inflammatory conditions, it will be useful to also list the non-autoimmune infectious manifestations for reference with prevalence data. For example, otitis media, or lung infections (mentioned within the paper), or mucosal candidiasis. Same for other manifestations such as cardiac or malignant conditions. Given the impressive number of patients, it will be useful to the readers to have prevalence data for these as well, even in brief statements within the results. 

      We appreciate this inquiry by the Reviewer and will present additional data on the co-occurring conditions mentioned by the Reviewer in a revised version of the manuscript.

      • Have the authors looked at DN T cells and whether they may be enriched in DS patients, given their enrichment in some autoimmune conditions? 

      Thanks for this inquiry. We did examine DN T cells (double negative T cells), which we referred to in our Figure 2 and Figure 2 – figure supplement 1 as non-CD4+ CD8+ T cells. Although this T cell subset is mildly elevated (in terms of frequency among T cells) in individuals with Down syndrome, the result did not reach statistical significance after multiple hypothesis correction. This negative result is shown in the heatmap in Figure 2 – figure supplement 1d.

      • It would be useful to move the segment of the discussion that discusses the interim predefined analysis of the phase 2 trial to the corresponding segment of the results. As this reviewer was reading the paper, it was unclear why the interim analysis was done, whether it was predefined and it was not until the discussion that it became apparent. I believe it will help the readers to have a brief mention that this interim analysis was predefined and set to occur at the first 10 DS enrollees. Also, it would be helpful to state what is the total number of DS patients planned for enrollment in the Phase 2 trial which is continuing recruitment. 

      We appreciate this comment and will modify the text following the Reviewer’s guidance in the revised manuscript. The trial will be considered complete once a total of 40 participants undergo 16-weeks of treatment with good medicine compliance (less that 15% missed doses).

      • Although the authors present data on TPO autoantibodies before and after tofacitinib, it remains unclear whether the other non-TPO autoantibodies were altered during treatment or whether this was a TPO autoantibody-specific phenomenon. Was there an alteration in mature B cells or plasmablast populations after tofacitinib? If these data are available, they would further enhance the manuscript. If they are not available, it would be useful for the authors to discuss those in the discussion of the manuscript. 

      We are grateful for this comment, which strongly aligns with our future research interests and plans for the analysis of the full cohort once the trial is completed. In the interim analysis, we analyzed only auto-antibodies related to autoimmune thyroid disease and celiac disease, as shown in the manuscript. However, we plan to complete a more comprehensive analysis of the effects of JAK inhibition on autoantibody production once the full sample set is available at the end of the trial. Likewise, the clinical trial protocol contemplates collection and processing of blood samples for immune mapping using mass cytometry, which will enable us to answer the question from the Reviewer about potential changes in B cells or plasmablasts populations. Following Reviewer’s guidance, we will discuss these planned analyses in the Discussion of the revised manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Cellular immune phenotyping data in Figure 2 presents a large number of patients with DS versus euploid controls (292 and 96 respectively). Given the relatively large cohort there would seem to be an opportunity to determine whether age or sex alters the immune phenotype shown, for example, TEMRAs, etc. Was the data analyzed in this way? 

      We welcome this comment, which clearly aligns with our research interests and planned additional analyses of these datasets generated by the Human Trisome Project. We can share with the Reviewer that although sex as a biological variable has minimal impacts on the strong immune dysregulation observed in Down syndrome, there are clear age-dependent effects, with some immune changes occurring early during childhood versus others taking place later in adult life. A manuscript describing a complete analysis of age-dependent effects on the multi-omics datasets in the Human Trisome Project is currently under preparation.

      (2) The authors should strongly consider incorporating/discussing the findings from Gansa et al, Journal of Clinical Immunology May 2024 - where they reviewed the immune phenotype of 1299 patients with Down syndrome. 

      Thanks for this suggestion, we will surely cite and discuss this recent paper in the revised manuscript.

      (3) It is difficult to differentiate patients Hs2 and Ps1 in Figure 5d. 

      Thanks for this observation, we will modify the labels for greater clarity in the revised manuscript.

      (4) Given their finding of no correlation between cytokine levels/immune phenotype and autoimmunity, some additional discussion of the relevance of hypercytokinemia in the pathogenesis of autoimmunity would seem relevant (given that this was the basis for the clinical trial). The authors mention that cytokine levels may not be appropriate measures of disease in the patients. 

      We welcome this opportunity to expand the discussion of the relevance of hypercytokinemia in the pathogenesis of autoimmunity and will do so in the revised manuscript.

      (5) Data availability statement: appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      The authors should perform experiments to answer this question: does Cav3 transcription increase in the G369i-KI, or is there instead some post-transcriptional modulation that permits surface expression of functional Cav3-containing channels in the absence of typical HVA Ca conductances? Also, the authors should determine whether G369i-KI can mediate Ca2+ release from intracellular stores and whether release from stores is upregulated as Cav3-containing channel expression (or function) is increased. 

      We performed transcriptomic (drop-seq) analysis to test whether a Cav3 subtype is upregulated in cones of G369i KI mice. These experiments show that, consistent with previous studies (PMID 35803735, 26000488), Cacna1h appears to be the primary Cav3 subtype expressed mouse cones. However, as shown in new Supp.Fig.S3, there was no significant difference in the levels of Cacna1h transcripts in WT and G369i KI cones. Therefore, we propose that there may be some post-transcriptional modification, or alteration in a pathway that regulates channel availability, that enables the contribution Cav3 channels to the whole-cell Ca2+ current in the absence of functional Cav1.4 channels cones.

      We also performed Ca2+ imaging experiments in WT vs G369i KI cone terminals to assess whether the diminutive Cav3 current in G369i KI cone terminals may be compensated by upregulation of a Ca2+ signal such as from intracellular stores. Arguing against this possibility, depolarization-evoked Ca2+ signals in G369i KI cones were dramatically reduced compared to WT cones (new Fig.9). 

      Reviewer #2 (Recommendations For The Authors): 

      Major points- 

      (1) It is stated in too many places that cone features in the Cav1.4 knock-in are "intact", preserved, or spared, but this representation is not accurate. There are two instances in this study that qualify as intact when comparing KI to WT: 1) the photopic a-waves in the Cav1.4 knock-in (also demonstrated in Maddox et al 2020) and 2) latency to the platform (current MS, Figure 7f). However, in the numerous instances listed below, the authors compared the Cav1.4 knock-in to the Cav1.4 knock-out, and then referred to the KI as exhibiting intact responses. The reference point for intactness needs to be wildtype, as appropriately done for Figures 2 and 3, and when comparing the KI to the KO the phrasing should be altered; for example: "the KI was spared from the extensive degeneration witnessed in the KO....". 

      In most cases, we clearly note that there are key differences in the WT and the G369i KI cone synapses, which highlight the importance of Cav1.4-specific Ca2+ signals for certain aspects of the cone synapse. We disagree with the reviewer on the point that we did not often use the WT as a reference since most of our experiments involved comparisons of only WT and G369i KI (Figs. 3-6) or WT, G369i KI, and Cav1.4 KO (Figs.1,7—and in these cases comparisons specifically between WT and G369i KI mice were included). We used “intact” as a descriptor for G369i KI cone synapses since these are actually present, albeit abnormal in the G369i KI retina, whereas cone synapses are completely absent in the Cav1.4 KO retina. To avoid confusion, we modified our use of “intact” and “preserved” where appropriate.

      A. Abstract, line 34 to 35: ".......preserved in KI but not in KO.". 

      Abstract was rewritten and this line was removed.

      B. Line 36: "....synaptogenesis remains intact". The MS documents many differences in the morphology of KI and WT cones (immunofluorescence and electron microscopy data), which is counter to an intact phenotype. 

      The sentence was: “In CSNB2, we propose that Cav3 channels maintain cone synaptic output provided that the Ca2+-independent role of Cav1.4 in cone synaptogenesis remains intact.”

      Here the meaning of “intact” refers to the Ca2+ -independent role of Cav1.4, not synapses. Thus, we have left the sentence unchanged.

      C. This strikes the right balance, lines 67 to 68: "....although greatly impaired.....". 

      D. Line 149, "Cone signaling to a postsynaptic partner is intact in G369i KI mice". This description is inaccurate. Here there is only WT and KI, and the text reads as follows in line 162: "terminals (Figure 6b). The ON and OFF components of EPSCs in G369i KI HCs were measurable, although lower in amplitude than in WT (Figure 6a,b)." Neither "measurable" nor "lower in amplitude" meet the definition of "intact", and actual numerical values are lacking in the text. 

      We have added results showing that there are no light responses in the Cav1.4 KO horizontal cells and have modified the sentence to: “Cone synaptic responses are present in horizontal cells of G369i KI but not Cav1.4 KO mice”. 

      We have modified discussion of these results as (line 210-213): “Consistent with the lack of mature ribbons and abnormal cone pedicles (Fig.1), HC light responses were negligible in Cav1.4 KO mice (Fig.8a,b). In contrast, the ON and OFF responses were present in G369i KI HCs although significantly lower in amplitude than in WT HCs (Fig. 8a,b).”

      E. Please add a legend to Figure 6a to indicate the intensities. The shape of the KI responses is different from the control which is worthy of discussion: i) there is no clear cessation of HC EPSCs in the KI during the light ON period (when release stops, Im fluctuations should be minimal), and ii) the "peaked" appearances of the initial 500ms of the On and Off periods are very similar in shape for the KI (hard to interpret in the same fashion as a control response). How were the On and Off amplitudes analyzed? Furthermore, the OFF current is not summarized in Figure 6D, but should not this be when Cav3 should be opening and triggering release: Off response-EPSC? Lastly, Figure 6b,d shows a ~70% reduction in On-current in the KI, and the KI example of 6b an 80% reduction in Off current compared to WT. Yet, the only place asterisks are used to indicate sig diff is the DNQX data within each genotype in Fig 6d. These data cannot be described as showing "intact" KI responses, and the absence of numerical and statistical values needs to be addressed. 

      New Fig.8a depicting the horizontal cell light responses has been modified to include the legend indicating light intensities. The ON and OFF amplitudes were analyzed as the peak current amplitudes. This information has been added to the legend.

      The reviewer is correct in that the OFF response represents the EPSC whereas the ON response represents the decrease in the EPSC with light. To avoid confusion, we changed the y axis label for the averaged data to read ON or OFF “response” rather than “current” in new Fig.8b.

      As the reviewer suggests, the more transient nature of the KI response during the light ON period could result from aberrant continuation of vesicular release during the light-induced hyperpolarization of cones in the KI mice, in contrast to the prolonged suppression of release by light which is evident in the WT responses. We speculated on this difference as follows (lines 237-241):

      “In addition to its smaller amplitude, the transient nature of the ON response in G369i KI HCs suggested inadequate cessation of cone glutamate release by light (Fig.8b). Slow deactivation of Cav3 channels and/or their activation at negative voltages20 could give rise to Ca2+ signals that support release following light-induced hyperpolarization of G369i KI cones.”

      We added astericks to new Fig.8b,d indicating statistical differences and description of the tests in the legend.

      F. line 168 the section titled "Light responses of bipolar cells and visual behavior is spared in G369i KI but not Cav1.4 KO mice". 

      Changed to: “Light responses of bipolar cells and visual behavior are present in G369i KI but not Cav1.4 KO mice”

      Last sentence of erg results, 189-190: "These results suggest that cone-to-CBC signaling is intact in G369i KI mice.". "Spared and intact" are not accurate descriptions. The ERG data presented here shows massive differences between WT and the KI, except in the instance of awaves. 

      This sentence was removed.

      As for Figure 6, the results text related to Figure 7a-d does not present real numbers for ERG responses, and there is no indication of significant differences there or in the Figure panels. For instance, in Figure 7b, b-waves are KI are comparable to KO, except at the two highest-intensity flashes that show KI responses ~20% the amplitude of WT. Presentation of KI and KO data on a 6- to 10-fold expanded scale higher than WT can be misleading: a quick read of these Figure panels might make one incorrectly conclude that the KI is intact while the KO is impaired when compared to WT. The Methods section needs more details on the ERG analysis (e.g. any filtering out of oscillatory potentials when measuring b-wave, and what was the allowable range of time-to-peak for b-wave amplitude, etc..). 

      The vertical scaling of the ERG results in new Fig.10c,d has been changed so as to reflect clearly diminished responses of the KO and KI vs the WT. Further details regarding the ERG analysis was added to the Methods section.

      G. Can you point to other studies that have used the "visible platform swim test" used in Figure 7e, f, and specify further how mice were dark/light adapted prior to the recordings? 

      As referenced in the Methods, original line 674, the methods we used for the swim test were described in our previous study (PMID 29875267). Other studies that have used this assay include PMIDs: 28262416, 26402607.

      (2) The Maddox et al 2020 study does not safely address whether rods have a residual T-type Ca2+ current in the Cav 1.4 KO or KI. The study showed that membrane currents measured from rods in the KI and KO retina were distinct from WT, supporting their claim that L-type Ca2+ current is absent in the KI and KO. However, the recordings had shortcomings that challenge the analysis of Ca2+ currents: i) collected at room temp (22-24{degree sign}C), ii) at an unknown distance from the terminal (uncertain voltage clamp), iii) with a very slow voltage ramp rate that is not suitable for probing T-type currents (Figure 1d Maddox 2020, 140 mV over 1 sec: 7msec/1mV), and iv) at a signal-to-noise that does not allow to resolve a membrane current under 1 pA (avg wt rod Ca2+ current was -3.5 pA, and line noise ~1pA peak-to-peak in Maddox 2020). Suggestion: say T-type currents were not probed in Maddox et al 2020, but Davison et al 2022 did not find PCR signal for Cav3.2 in rods. 

      We disagree that recordings in the Maddox 2020 study were not sufficient to uncover a T-type current. The voltage ramps in that study were not much slower than that of the Davison et al. 2022 study (they used 0.19 mV/ms). Moreover, in new Supp. Fig.S1, we show that like the slower voltage ramp (0.15 mV/ms) used in the prior study of G369i KI rods, the voltage ramps we used in the present study (0.5 mV/ms), which clearly evoke currents with T-type properties in G369i KI cones (Fig.2a,b, Fig.3a,b) do not evoke currents in WT or G369i KI rods.  

      Minor comments. 

      (1) Suggestion: add an overview panel to Figure 1 that shows the rod terminals in the KI. The problem is that cropping out the ribbon and active zone signals from rods, to highlight cones, can give the impression that the cones are partially spared in the KI, and the rods are not spared at all. (yet you nicely clarify this in Figure 4 and in the legend and text, etc.). 

      We chose to modify the legend with this information as in Fig.4 rather than modify the figure.

      (2) Mouse wt cone Ca2+ currents look like L-type currents, as do your monkey and squirrel cone recordings, and also much like those of mouse rods (see Figure S5, Hagiwara et al., 2018 or Grabner and Moser 2021). Your pharm data from mice and squirrels further supports your conclusion, and certainly took much effort. Davison et al 2022 J Neurosci showed PCR results that support their claim that a Cav3 current exists in wt cones. Questions: 1) have you tried PCR? 2) Can you offer more details on what Cav3 KO you tried and what antibodies failed to confirm the KO? As the authors know, one complication is that the deletion of one Cav can be compensated for by the expression of a new Cav. There are 3 types of Cav3s and removal of one type may be compensated for by another Cav3. 

      We have included drop-seq data (new Supp.Fig.S3) implicating Cav3.2 as the main Cav3 subtype in cones and have modified our discussion of these results accordingly. These experiments did not reveal any changes in Cav3 subtype expression in G369i KI vs WT cones.

      (3) Lines 95/96- onward, spend more time telling the story. When working out the biophysical and pharmacological behavior of the Ca2+ currents, you might want to initially refer to the membrane current as a membrane current, and then state how your voltage protocols, intra- and extra-cell solutions, and drugs helped you verify 1) L-type and 2) T-type Ca2+ currents. 

      We have modified the text with more detail.

      (4) If data is in hand, add a ramp I-V to Figure S2, which shows the response of the ground squirrel cone. The steps in S2a are excellent for making your point that a transient current is missing, and the bipolar is a great control to illustrate ML218 works. However, a comparison of a squirrel cone ramp to a bipolar ramp response could complete the figure. 

      See Reponse to #5 below.

      (5) Consider moving Supplementary Figures S2 and S3 to the main text; these are highly relevant to the story, novel, and well-executed. 

      Fig.S2 and S3 were added as new Figs.4,5. The new Fig.4 includes voltage ramps in ground squirrel cones (panel a) to compare with the bipolar data (panel f).

      (6) The nice electron microscopy reconstructions are not elaborated on in any detail, and there is no mention of ribbon size. Is the resolution sufficient to estimate ribbon size, the number of synaptic vesicles around the ribbon and in the adjacent cytosol? The images indicate major changes in the morphology of the terminals. Is the glial envelope similar in WT and KI? 

      Since ribbons were quantified extensively in the confocal analyses in Fig.6, we felt it unnecessary to add this to the EM analysis which focused mainly on aspects of 3D structure (i.e., arrangement of ribbons, postsynaptic wiring, cone pedicle morphology). We added further discussion of the change in morphology of the G369i KI cone pedicle (lines 200-203): “Compared to WT, ribbons in G369i KI pedicles appeared disorganized and were often parallel rather than perpendicular to the presynaptic membrane (Fig.7a-c). Consistent with our confocal analyses (Fig.1), G369i KI cone pedicles extended telodendria in multiple directions rather than just apically (Fig. 7a).”

      While we did not opt to characterize the glial envelope in WT cones, we did add an analysis of synaptic vesicles around ribbons to Table 2.

      (7) Discussion line 250: "we found no evidence for a functional contribution of Cav3 in our recordings of cones in WT mice (Figures. 2,3), ground squirrels, or macaque (Supplementary Figures S2 and S3).". I would not use "functional" in this context because when comparing your work to Davison et al 2022, they defined functional as a separate response component driven by Cav3. For instance, they examined the influence of their T-type current on exocytosis (by membrane capacitance) and other features like spiking Ca2+ transients. Suggestion: substitute functional with "detectable", and say "we found no detectable Cav currents". Or if you had Ttype staining, but not T-type Ca2+ currents, then say "no functional current even though there is staining...". 

      We have modified the text as (lines 336-338): “However, in contrast to recordings of WT mouse cone pedicles in a previous study21, we found no evidence for Cav3-mediated currents in somatic recordings of cones in WT mice (Figs.2,3).”

      We propose an alternative interpretation of the results in the Davison et al study concerning the conclusion that Cav3 channels contribute to Ca2+ spikes and exocytosis. That study used 100 µM Ni2+ to block a “T-type” contribution to spike activity in cones. In their Figs.4,5, the spikes are suppressed by 100 µM Ni2+ and 10 µM nifedipine, a Cav1 antagonist, and spared by the T-type selective drug Z944. This is problematic for several reasons. First, as shown by the authors

      (their Fig.2A1,A2) and others (PMID: 15541900), 100 µM Ni2+ inhibits Cav1-type currents in photoreceptors. Second, Z944 potentiates Cav1 current in their mouse cones (their Fig.2C1,C2). Thus, both reagents are suboptimal for dissecting the contribution of either Cav subtype to spiking activity. With respect to Cav3 channels and exocytosis, these authors interpreted a reduction in exocytosis upon holding at -39 mV compared to at -69 mV as indicating a loss of a T-type driven component of release. However, Cav1 channel inactivation (PMID: 12473074) could lead to the observed reduction in exocytosis at -30 mV.

      (8) Additional literature related to your Intro and Discussion. Regarding CSNB2, related mutations of active zone proteins, and what happens to Ca2+ currents when ribbons are deleted, you might want to consider the following studies that measure Ca2+ currents from rods: conditional KO of RIM1/2 (Grabner et al 2015 JN), KO of ELKS1/2 (Hagiwara et al, 2018 JCB), and KO of Ribeye (Grabner and Moser eLife 2021). In these studies, the Cav currents were absent in rods of the ELKS1/2 DKO, strongly reduced (80%) in the RIM1/2DKO, but altered in more subtle ways (activation-inactivation) without significantly changing steady-state Ca2+ current in the Ribeye KO. This does not seem to support some of the arguments you have made in the Introduction and Discussion regarding ribbon size and Ca2+ currents, yet the suggested literature is related to the topic at hand. 

      A description of these synaptic proteins as potential mediators of the effect of Cav1.4 on ribbon morphogenesis was added to the Discussion, lines 325-327.

      (9) Line 129: "Along with the major constituents of the ribbon, CtBP2, and RIBEYE", for clarity Ribeye has two domains, one that is identical to CtBP2 (B-domain) and the unique Ribeye domain (A-domain) that is only expressed at ribbon synapses. And, Piccolino is also embedded in the ribbon (Brandstaetter lab, Wichmann/Moser labs). In other words, Ribeye and Piccolino are the major constituents of the ribbon. 

      To avoid confusion, we simply mention Ctbp2 and RIBEYE in the context of the corresponding antibodies that were used to label ribbons.

      (10) Abstract: consider to rephrase "Ca2+-independent role of Cav1.4" by "Ca2+-permeationindependent role of Cav1.4" or alike 

      Sentence changed to: “In CSNB2, we propose that Cav3 channels maintain cone synaptic output provided that the nonconducting role of Cav1.4 in cone synaptogenesis remains intact.”

      Reviewer #3 (Recommendations For The Authors): 

      Cav1.4 voltage-gated calcium channels play an important role in neurotransmission at mammalian photoreceptor synapses. Mutations in the CACNA1f gene lead to congenital stationary night blindness that particularly affects the rod pathway. Mouse Cav1.4 knockout and Cav1.4 knockin models suggest that Cav1.4 is also important for the cone pathway. Deletion of Cav1.4 in the knockout models leads to signaling malfunctions and to abundant morphological re-arrangements of the synapse suggesting that the channel not only has a role in the influx of Ca2+ but also in the morphological organization of the photoreceptor synapse. Of note, also additional Cav-channels have been previously detected in cone synapses by different groups, including L-type Cav1.3 (Wu et al., 2007; pmid; Kersten et al., 2020; pmid), and also T-type Cav3.2 (Davison et al., 2021; pmid 35803735). 

      In order to study a conductivity-independent role of Cav1.4 in the morphological organization of photoreceptor synapses, the authors generated the knockin (KI) mouse Cav1.4 G369i in a previous study (Maddox et al., eLife 2020; pmid 32940604). The Cav1.4 G369i KI channel no longer works as a Ca2+-conducting channel due to the insertion of a glycine in the pore-forming unit (Madox et al. elife 2020; pmid 32940604). In this previous study (Madox et al. elife 2020; pmid 32940604), the authors analyzed Cav1.4 G369i in rod photoreceptor synapses. In the present study, the authors analyzed cone synapses in this KI mouse. 

      For this purpose, the authors performed a comprehensive set of experimental methods

      including immunohistochemistry with antibodies (also with quantitative analyses), electrophysiological measurements of presynaptic Ca2+ currents from cone photoreceptors in the presence/absence of inhibitors of L-type- and T-type- calcium channels, electron microscopy (FIB-SEM), ERG recordings and visual behavior tests of the Cav G369i KI in comparison to the Cav1.4 knockout and wild-type control mice. 

      The authors found that the non-conducting Cav channel is properly localized in cone synapses and demonstrated that there are no gross morphological alterations (e.g., sprouting of postsynaptic components that are typically observed in the Cav1.4 knockout). These findings demonstrate that cone synaptogenesis relies on the presence of Cav1.4 protein but not on its Ca2+ conductivity. This result, obtained at cone synapses in the present study, is similar to the previously reported results observed for rod synapses (Maddox et al., eLife 2020, pmid 32940604). No further mechanistic insights or molecular mechanisms were provided that demonstrated how the presence of the Cav channels could orchestrate the building of the cone synapse. 

      We respectfully disagree regarding the mechanistic advance of our study. As indicated by Reviewer 2, a major advance of our study is in providing a mechanism that can explain the longstanding conundrum that congenital stationary night blindness type 2 mutations that would be expected to severely compromise Cav1.4 function do not produce complete blindness. Our study provides an important contrast to the Maddox et al 2020 study in showing that rods and cones respond differentially to loss of Cav1.4 function, which is also relevant to the visual phenotypes of CSNB2. How the presence of Cav1.4 orchestrates cone synaptogenesis is an important topic that is outside the scope of our present study.

      In the present study, the authors also propose a homeostatic switch from L-type to (newly occurring) T-type calcium channels in the Cav1.4 G369i KI mouse as a consequence of the deficient calcium channel conductivity in the Cav1.4 G369i Cav1.4 KI mouse. In cones of the Cav1.4 G369i, the high-voltage activated, L-type Ca2+-entry was abolished, in agreement with their previous paper (Maddox et al., eLife 2020, pmid 32940604). The authors found a lowvoltage activated Ca2+ current instead that they assigned to T-type Ca2+-currents based on pharmacological inhibitor experiments. T-type Ca2+-currents/channels were already previously identified in other studies by independent groups and independent techniques

      (electrophysiology, RT-PCR, single-cell sequencing) in cones of wild-type mice (Davison et al.,

      2021, pmid 35803735; Macosko et al., 2015, pmid 26000488; Williams et al., 2022, pmid 35650675). In the present manuscript (Figures 3a/b), the authors also observed a low-voltage activated, T-type like current in cones of wild-type mice, that is isradipine-resistant and affected by the T-type inhibitor ML218. This finding appears compatible with a T-type-like current in wildtype cones and is consistent with the published data mentioned above, although the authors interpret this data in a different way in the discussion. 

      Due to the noise inherent in whole cell voltage clamp measurements and some crossover effects in the pharmacology, we cannot completely exclude the presence of a T-type current in WT mouse cones. However, our results very clearly support a conclusion opposite to that stated by the reviewer. Namely, if WT mouse cones have T-type Ca currents, then they are far smaller than those in the Cav1.4 G369i KI and KO cones. In particular, while we identified message for Cav3.2 in WT mouse cones, we were unable to identify a functional T-type current by either voltage clamp measurements or pharmacology. See below for a detailed rebuttal.

      This proposal of a homeostatic switch is not convincingly supported in this reviewer's opinion

      (for further details, please see below). Furthermore, no data on possible molecular mechanisms were provided that would support such a proposal of a homeostatic switch of calcium channels. No mechanistic/molecular insights were provided for a proposed homeostatic switch between Ltype to T-type channels that the authors propose to occur between wild-type and Cav1.4 G369i as a consequence of conduction-deficient Cav1.4 G369i channels. Is this e.g. based on posttranslational modifications that switch on T-type channels or regulation at the transcriptional level inducing expression of T-type calcium channel or on other mechanisms? The authors remain descriptive with their central hypotheses. No molecular mechanisms/signaling pathways were provided that would support the idea of such a homeostatic switch. 

      Homeostatic plasticity refers to the maintenance of neuronal function in response to some perturbation in neuronal activity and can result from changes in the expression of ion channel genes (PMID: 36377048, 32747440, 19778903) or regulatory pathways that modulate ion channels (PMID: 15051886, 32492405). We present multiple lines of evidence showing that Cav3 currents appear in cones upon genetically induced Cav1.4 loss of function and can support cone synaptic responses and visual behavior if cone synapse structure is maintained. Our new transcriptomic studies show no difference between levels of Cav3 channel transcripts in WT and G369i KI cones, suggesting that the appearance of the Cav3 currents in G369i KI cones does not result from an increase in Cav3 gene expression. We are currently investigating our transcriptomic dataset to determine if Cav3 regulatory pathways are upregulated in G369i KI cones and will present this in a follow-up study.

      The authors show residual photopic signaling in the non-conducting Cav1.4 G369i KI mouse as judged by the recording of postsynaptic currents, ERG recordings and visual behavior tests though in a reduced manner. The residual cone-based signaling could be based on the nonaffected T-type Ca2+ channel conductivity in cone synapses. Given that the L-type current through Cav1.4 is gone in the Cav1.4 G369i KI as previously shown (Maddox et al., 2020, pmid 32940604), the T-type calcium current will remain. However as discussed above, this does not necessarily support the idea of a homeostatic switch. 

      A major point which we highlighted with new results is that despite the expression of Cav3 transcripts in WT mouse cones, Cav3 channels do not contribute to the cone Ca2+ current. This is at odds with the Davison et al study (PMID: 35803735, see our response to Reviewer 2, pt 7 for caveats of this study), but our results convincingly show that the Cav3 current appears only when Cav1.4 is genetically inactivated. Pharmacological or electrophysiological methods that should reveal the presence of Cav3 currents do not change the properties of the Ca2+ current in cones of WT mice, ground squirrel, or macaque:

      • Figs.2-4: Voltage steps to -40 mV (Fig 2e) that activate a sizeable T-current in G369i KI mouse cones produce a negligible transient at pulse onset in WT mouse cones. Similarly, transient currents that are obvious in G369i KI mouse cones during the final step to -30 mV are absent in WT cones.  When we block Cav1.4 with isradipine either in cones of WT mice or ground squirrel, the current that remains does not resemble a Cav3 current but rather a scaled down version of the L-type current. ML218, which readily blocks Cav3 channels in HEK293T cells and in G369i KI cones, has only minor effects in cones of WT mice and ground squirrel; these effects of ML218 can be attributed to non-specific actions on Cav1.4 (new Supp.Fig.S2). New Fig.4 (moved from the supplementary data to the main article) clearly shows that the ML218-sensitive current in ground squirrel cones exhibits properties of Cav1.4 not Cav3 channels. 

      • Figs.2,5: Holding voltages that inactivate Cav3 channels have no effect on the Ca2+ current in cones of WT mice or macaque (recordings of macaque cones were moved from the supplement to the main article as new Fig.5).

      In Figure 4 the authors measured an increase in the size of the active zone (as judged by the size of the bassoon cluster) and of the synaptic ribbons in the Cav1.4 G369i. A mechanistic explanation for this phenomenon was not provided and the underlying molecular mechanisms were not unraveled. 

      The FIB-SEM data uncover some ultrastructural alteration/misalignments of the synaptic ribbons and misalignments of the regular arrangement of the postsynaptic dendrites in the G369i KI mice. Also concerning this observation, the study remains descriptive and does not reveal the underlying mechanisms as it would be expected for eLife. 

      We respectfully disagree on the descriptive nature of our study and the need for a full characterization of the molecular mechanism underlying the cone synaptic defects in the G369i KI mouse.   

      An important study in the field (Zanetti et al., Sci. Rep. 2021; pmid 33526839) should be also cited that used a gain-of-function mutation of Cav1.4 to analyze its functional and structural role in the cone pathway. 

      We have added citation of this paper to the Discussion (lines 354-356).

      In conclusion, the study has been expertly performed but remains descriptive without deciphering the underlying molecular mechanisms of the observed phenomena, including the proposed homeostatic switch of synaptic calcium channels. Furthermore, a relevant part of the data in the present paper (presence of T-type calcium channels in cone photoreceptors) has already been identified/presented by previous studies of different groups (Macosko et al., 2015; pmid 26000488; Davison et al., 2021; pmid 35803735; Williams et al., 2022; pmid 35650675). The degree of novelty of the present paper thus appears limited. I think that the study might be better suited in a more specialized journal than eLife. 

      We thank the reviewer for acknowledging the rigor of our study but disagree with their evaluation regarding the novelty of our work as outlined in our responses above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      My comments are largely limited to suggestions to make the manuscript easier to read and digest.

      In the abstract they say RNA sequencing highlights changes in innate...

      Could they be more specific? Innate immune system up or down? They do not indicate actual findings in the abstract.

      We thank the reviewer for the comment and we have revised the abstract accordingly.  

      Their use of non‐intuitive abbreviations is often confusing. Perhaps they can add a table in methods listing all the abbreviations so that the reader can follow the data better. mNGA, vmHT....etc.

      As suggested, we have now included a list of the abbreviations used in the paper.

      There are mis‐spellings in the manuscript.

      We have gone through the manuscript and corrected the mis-spellings.   

      Has the SPR RNAi line been validated?

      The SPR RNAi line that we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We have revised the manuscript and added these statements in the results section concerning SPR RNAi.  

      In the figures showing the Climbing Index vs time, can they abbreviate seconds as sec vs s? At least I think it is seconds. At first, I thought it was Time or Times, and was confused about what they were indicating on those types of graphs (Figures 1D‐F).

      We have revised the figure as suggested by the reviewer.

      In Figure 3F, they have a significance indicated in an unclear manner. It looks like they are comparing neuropil to the cortex, but I think they really mean to compare the cortex of sham to cortex of D31?

      The reviewer was correct. We have revised figure 3F to make this clear.     

      In Figure 4B, what is the y‐axis? Percentage of what? Is that percentage of total flies?

      The reviewer was correct. We have revised the figure to make this clear. 

      In a figure like SF3 B, what is the y‐axis? "Norm. Accum. CI" Can they explain the abbreviation?

      We have revised the Y-axis label to be “Normalized accumulative CI”.  We have also made this clear in the legend.   

      In the methods, what does this mean: "Regions devoid of Hoechst and phalloidin signal in non‐physiologically appropriate areas were considered vacuoles"? What are non‐physiologically appropriate areas? To me, that would mean outside of the brain. I would have thought the areas should be physiologically appropriate (aka neuropil and cortex)? This is confusing.

      We have revised the method section to be more specific.  In the Drosophila brain, there are structures such as esophagus that are devoid of both Hoechst and phalloidin staining, which were excluded from our vacuole quantification.    

      Reviewer #2 (Recommendations For The Authors):

      Since I use mammalian systems, my comment about the confirmation of siRNA should be removed if this is not possible in the Drosophila system.

      We have revised the figures to include total N values when appropriate. Including individual n values for each experimental assay and condition will inevitably crowd the figure legends, so specific values are available upon request. 

      Regarding RNAi knockdown of sex peptide receptors (SPRs), we agree that confirmation of the knockdown by IHC or qRT-PCR will further strengthen our findings. It should be noted, however, that the RNAi line we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We have revised the manuscript to include these statements in the results concerning the SPR RNAi knockdown.    

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figures 1 and 2, the authors found that females have a lower climbing index in the acute phase in D17 injury, not due to neurodegeneration as shown no significant changes of brain vacuolation and other markers. However, in Figure 3, the authors found that female flies have a lower climbing index, more brain vacuolation, and neurodegeneration in the late phase. It's not very convincing that having a lower climbing index at the late phase is due to neurodegeneration. Is it possible that females suffered from more severe acute effects, at least in D17 injury?

      We thank the reviewer for this point. Female flies injured on D17 displayed acute climbing deficits at 90 minutes post-injury. Since we did not observe significant structural changes in the brain at this time, we believe that this short-term functional deficit is not due to acute neuronal death. Here it is important to note that males did not display any acute climbing deficits when injured on D17, which suggests that the females suffered from more severe acute effects than males. However, these injured female flies recovered fully at 24 hours post-injury and displayed no climbing deficits. At two weeks post-injury, we observe climbing deficits and increased vacuole formation as a direct result of the injuries on D17 (see Supplemental Figure 3). When we assessed sensorimotor behavior and brain vacuolation on D45, we found that the injured females had significantly lower climbing indices and more brain vacuolation than the non-injured females of the same age. In this case, the concurrent observance of decreased climbing ability and increased brain vacuolation suggests chronic neurodegeneration in aged, injured females. This is not to be confused with the acute neuronal death observed by other groups using injury models of stronger severity. Overall, our data are consistent with the current view that in many neurodegenerative diseases, functional deficits often precede observable brain degeneration, which may take years to manifest.

      (2) The authors determined late‐life brain deficits and neurodegeneration purely based on climbing index and vacuole formation. These phenotypes are not really specific to TBI‐related neurodegeneration and the significance and mechanisms of vacuole formation are not clear. Indeed, in Figures 3 A and B, male flies especially D31inj tend to have a much larger variation than any other groups. What could be the reasons? The authors should perform additional analyses on TBI‐related neurodegeneration in flies, which have been shown before, such as retinal degeneration and loss, neuronal degeneration, and loss, neuromuscular junction abnormalities, etc (Genetics. 2015 Oct; 201(2): 377‐402).

      We thank the reviewer for the thorough evaluation of our manuscript. The reviewer raised a very important question: whether the neurodegeneration observed in our model is specific to TBI. As the reviewer rightly pointed out, the neurodegenerative phenotypes are unlikely to be specific to TBI-related neurodegeneration. Throughout the manuscript, we have tried to convey the notion that the mild physical impacts to the head represent one form of environmental insults, which in combination with other risk factors such as aging can lead to the emergence of neurodegenerative conditions. It should be noted that the negative geotaxis assay and vacuolation quantification are two well-established approaches to assess sensorimotor deficits and frank brain degeneration in fly brains. 

      It is important to emphasize that the head-specific impacts delivered to the flies in our study are much milder than those used in previous studies. As we showed in our figure 1, this very mild form of head trauma (referred to as vmHT) did not cause any death, nor affected the lifespan of the injured flies. Our supplemental data also show very minimal structural neuronal damage and no acute and chronic apoptosis induced by vmHT exposure. Consistently, we did not observe any exoskeletal or eye damage immediately following injuries, nor did we observe any retinal degeneration and pseudopupil loss at the chronic stage of these flies. We have incorporated these important points in the revised manuscript.  

      (3) In Figure 4, it would be important to perform the behavior test fly speed and directional movement in the acute phase as well to determine whether the females have reduced performance at the acute phase.

      We thank the reviewer for this suggestion. Please note that our modified NGA has already improved the spatiotemporal resolution over the classic NGA.  The data presented in Fig.3 show that there are no acute deficits for young cohorts.  Therefore, we do not believe that the detailed analysis of the direction and speed of these flies is essential.  

      Unfortunately, the current setup for the AI-based analysis requires manual corrections of tracking errors, which are time-consuming and tedious.  We are building a newly designed AI-based NGA (NGA.ai) that will allow automatic tracking and quantification with minimal manual interventions. Once it is completed, we will perform some of the analyses that the reviewer suggested.  

      (4) In Figure 8, the authors performed an RNA‐seq analysis and identified some dysregulated gene expressions. However, it is really surprising to see so few DEGs even in wild‐type males and mated females, and to see that none of DEGs overlap among groups or related to the SP‐signaling. This raises questions about the validity of the RNAseq analysis. It is critical to independently verify their RNA‐sequencing results and to add some more molecular evidence to support their conclusion.

      We agree that future studies are needed to independently validate our RNA sequencing results. We believe that the small number of DEGs are likely due to two unique features of our study: (1) the very mild nature of our injury paradigm and (2) the chronic examination timepoint that was long after the head injury and SP exposure, which distinguish our study from previous fly TBI studies.  As pointed out in the manuscript, our study was aimed to understand how early life exposure to repetitive head traumatic insults could lead to the latelife onset of neurodegenerative conditions. We hope to further validate our results in our next phase of experiments using single-cell RNA sequencing and RT-qPCR. 

      (5) The current results raise a series of interesting questions: what implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans? Would mammalian female animals mating with wild‐type or sex hormone‐null male animals have different effects on their post‐injury behavior tests or neuropathological changes? What are the mechanisms underlying the sexual dimorphism?

      As the reviewer pointed out, it would be very interesting to explore the possible roles of sex peptide-signaling in other animals and humans. As far as we know, there is no known mammalian ortholog to the insect sex peptide, so it would be difficult to study SP or an SPlike molecule in mammalian models. However, we believe that prolonged post-mating changes associated with reproduction in female fruit flies contribute to their elevated vulnerability to neurodegeneration.  In this regard, drastic changes within the biology of female mammals associated with reproduction can potentially lead to vulnerability to neurodegeneration. We agree that this demands further study, which may be done with future collaborators using rodent or large animal models.  We have discussed this point in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank you very much for reviewing our manuscript and express our sincere appreciation for the valuable and thoughtful comments that led us to significantly improve the manuscript on Fshr-ZsGreen reporter mice. We have seriously taken your comments to make a major revision of the manuscript, and here is a summary of the revision:

      (1) New data on Fshr expression are input to the revised Manuscript:

      a. Fshr expression in the testis and adipose tissues (WAT and BAT) of B6 mice;

      b. Fshr expression in the testis of B6 by RNA-smFISH;

      c. Comparison of Fshr expression in the testis and ovary between Fshr-ZsGreen and B8 mice by ddRT-PCR to prove Fshr expression without interruptions by insertion of P2A-ZsGreen vector;

      d. Reduction of Fshr expression in osteocytes within the femoral sections from DMP1-CreERT2:Fshrfl/fl mice;

      e. Fshr expression in an established Leydig cell line-TM3 by immunofluorescence and ddRT-PCR, also show Fshr located in the nuclei of TM3 cells;

      f. Fshr expression at scRNA-seq level from 5 public single cell portals as Supplementary Data 3 to support our findings of the widespread expression pattern of Fshr, particularly in Leydig cells.

      (2) Re-organization of Figure 2 with a new legend.

      (3) A new paragraph is added to the Discussion Section of the revised MS to explain the function of P2A peptide in generation of GFP reporter mice and why Fshr express is not interrupted by the P2A-ZsGreen insertion in Fshr-ZsGreen reporter.

      (4) Deletion of Figure 1-D-c, as it is not necessary.

      (5) Replace of Figure 8-A (the left panel) with a reduced exposure time image.

      (6) Amended parts of the revised MS are labeled in red.

      A point by point response to the Reviewers’ comments:

      Reviewer 1:

      One of the shocking observations in this manuscript is the expression of FSHR in Leydig cells. Other observations are in the osteoblasts and endothelial cells as well as epithelial cells in different organs. The expression of ZsGreen in these tissues seems high and one shall start questioning if there are other mechanisms at play here.

      First, the turnover of fluorescent proteins is long, longer than 48h, which means that they accumulate at a different speed than the endogenous FSHR This means that ZsGreen will accumulate in time while the FSHR receptor might be degraded almost immediately. This correlated with mRNA expression (by the authors) but does not with the results of other studies in single-cell sequencing (see below).

      The expression of ZsGreen in Leydig cells seems much higher than in Sertoli cells, this is "disturbing" to put it mildly. This is visible in both the ZsGreen expression and the FISH assay (Figure 2 B-D).

      Thank you for this valuable comments. We added new data on Fshr expression to prove the presence of Fshr in Leydig cells in B6 detected by immunofluorescence staining, RNA-smFISH and ddRT-PCR, as well as in TM3 cells-isolated Leydig cells from a male mice in the revise MS (Fig 2E, F and G), that demonstrate no interruptions of normal Fshr expression by insertion of P2A-ZsGreen vector into a locus located between exon10 and stop code. We use ZsGreen as an indicator for active Fshr promoter status, rather than a method to measure Fshr expression, which is done by ddRT-PCR. These data are shown in Figure 2G of the revised MS

      In addition, we provide scRNA-seq based evidence on Fshr expression in human Leydig cells from two single cell portals (DISCO and BioGPS) as shown in Supplementary Data 3 in the revised MS. We also cited a recent report on scRNA-seq analysis of Fshr expression in Hu sheep in the revised MS as Reference 65 (PMID: 37541020) 1, which also clearly showed Fshr expression in Leydig cells at single cell level in Hu Sheep.

      We believe that the lack of Fshr expression in some single cell databases may be due to the degradation of Fshr transcript in cells during the process of single cell populations. In our laboratory, we spent more than 6 months to optimize methods and reagents to perverse mRNA integrity more than 8 for RAN-seq.

      The expression in WAT and BAT is also questionable as the expression of ZsGreen is high everywhere. That makes it difficult to believe that the images are truly informative. For example, the stainings of aorta show the ZsGreen expression where elastin and collagen fibres are - these are not "cells" and therefore are not expressing ZsGreen.

      FISH expression (for FSHR) in WT mice is missing.

      Also, the tissue sections were stained with the IgG only (neg control) but in practice both the KI and the WT tissues should be stained with the primary and secondary antibodies. The only control that I could think of to truly get a sense of this would be a tagged receptor (N-terminal) that could then be analysed by immunohistochemistry.

      Reply 2 and 3: Thank you for these comments. New data on Fshr expression in WAT and BAT of B6 mice by immunofluorescence staining and in the testis of B6 mice by immunofluorescence staining and RNA-smFISH are added to the revised MS (Fig.2D and E, and Fig. 4G), showing similar patterns to that of Fshr-ZsGreen mice. Furthermore, we provide more evidences as Supplementary Data 3 on Fshr expression obtained from 4 public single cell portables, showing FSHR expression in a widespread organs and tissues (including different fractions of adipose cells) of human, mice and rat at single cell levels. Please also check Fshr expression pattern in adipose tissues by immunostaining for Fshr in previous reports (Fig. 3a of PMID: 28538730 and Fig. 2 of PMID: 25754247) 2 3, which showed a similar expression pattern to our finding. These data should address your concerns on Fshr expression in WAT and BAT and other organs/tissues.

      Regard of “For example, the stainings of aorta show the ZsGreen expression where elastin and collagen fibres are - these are not "cells" and therefore are not expressing ZsGreen.” We believe that you referred to the image of the aorta in Supplementary Data2. However, Please take a look at the images of the aorta in Figure 5-C, which shows positively stained the layer of ‘elastin and collagen fibres’ for EMCN and a-SMA colocalized with Fshr expression with stained DAPI at a 1000X magnification, indicating endothelial cells and the cellular membrane presented in this layer, not just ‘elastin and collagen’.

      The authors also claim:

      To functionally prove the presence of FSHR in osteoblasts/osteocytes, we also deleted FSHR in osteocytes using an inducible model. The conditional knockout of FSHR triggered a much more profound increase in bone mass and decrease in fat mass than blockade by FSHR antibodies (unpublished data).

      This would be a good control for all their images. I think it is necessary to make the large claim of extragonadal expression, as well as intragonadal such as Leydig cells.

      Thank you for this very encouraging comment. As you suggested, we did add a result of reduced Fshr expression in osteocytes from DMP1-CreERT2+:Fshrfl/fl mice treated with tamoxifen to the revise MS, as shown in Figure 3D, demonstrating Fshr present in osteocytes and the specificity of Fshr antibody. Furthermore, we incorporated your advice on making ‘ large claim of extrogonadal and intragonadal expression of Fshr’ into the revised MS in red.

      Claiming that the under-developed Leydig cells in FSHR KO animals are due to a direct effect of the FSHR, and not via a cross-talk between Sertoli and Leydig cells, is too much of a claim. It might be speculated to some degree but as written at the moment it suggests this is "proven".

      Thank you for pointing out this incorrect claim and we apologized for it. In the revised MS, we deleted this claim.

      We also do not know if this FSHR expressed is a spliced form that would also result in the expression of ZsGreen but in a non-functional FSHR, or whether the FSHR is immediately degraded after expression. The insertion of the ZsGreen might have disturbed the epigenetics, transcription, or biosynthesis of the mRNA regulation.

      Thanks for this comment. In the revised MS, we added a new section to explain the function of P2A peptide in generation of a GFP reporter by sgRNA-guilded site specific knockin of P2A ZsGreen vector through CRISPRA/cas9 and provided a new result on comparison of Fshr expression in the testes and ovaries from Fshr-ZsGreen and B6 mice, showing equivalent Fshr expression between Fshr-ZsGreen and B6 mice (Figure 2G), which indicates no interruptions of Fshr expression by the insertion of P2A vector.

      The authors should go through single-cell data of WT mice to show the existence of the FSHR transcript(s).<br /> For example here:<br /> https://www.nature.com/articles/sdata2018192

      Thank you so much for the valuable comment. Yes, we took you critical advice to check Fshr expression through 4 single cell portals, including DISCO, GTEx, BioGPS and Human single cell portal, and present the collected data as Supplementary Data 3 in the revised MS, that strongly support our findings of the wider Fshr expression. Particularly, Fshr expression in Leydig cells is proved by scRNA-seq studies of human cells from DISCO and BioGPS, as well as a recent study in Hu sheep (PMID: 37541020) 1 and we cited it in the revised MS.

      Reviewer 2:

      Is the FSHR expression pattern affected by the knockin mice (no side-by-side comparison between wt and GSGreen mice, using in situ hybridization and ddRTPCR, at least in the gonads, is provided)?

      Thanks for the comment. In the revised MS, we provided a set of new data on Fshr expression in the testis, ovary, WAT and BAT of B6 mice by immunofluorescence staining and by RNA-smFISH for Fshr expression, showing similar expression patterns. Additionally, we also performed ddRT-PCT to compare Fshr expression in the testes and ovaries between Fshr-ZsGreen and B6 mice, demonstrating equivalent expression of Fshr expression between Fshr-ZsGreen and B6 mice. Interestingly, we also observed an significantly higher Fshr expression in the testis than that in the ovary (more than 30 folds).

      Is the splicing pattern of the FSHR affected in the knockin compared to wt mice, at least in the gonads?

      Thanks for the question. Please see our reply to the Reviewer 1 for the function of P2A peptide used for generation of GFP reporters.  Although we didn’t directly assess the splicing pattern, we provide a result of comparison of Fshr expression in Figure 2F in the revised MS, indirectly showing no changes of the splicing pattern. We will assess the splicing pattern of Fshr in the future that has been neglected in the field.

      Are there any additional off-target insertions of GSGreen in these mice?” and “Are similar results observed in separate founder mice?

      Thanks for the questions. As we describe it in the method section  in detail in the MS, Fshr-ZsGreen reporter was produced by the a site-specific long ssDNA recombination of the P2A-ZsGreen targeting vector to the locus between Exon10 and stop code by CRIPRA/cas9, which was guided by site-specific single guide RNA (sgRNA). We showed the results of Southern blot, DNA sequencing and site-specific PCR, proving the site-specific insertion of P2A-ZsGreen as shown in Figure 1. Because of the site-specific recombination, professionally, only one funder line is required for the study and there are no additional off-target insertions.

      How long is GSGreen half-life? Could a very long half-life be a major reason for the extremely large expression pattern observed?

      Thanks for the question. The half life of ZsGreen, also called ZsGreen1, is at least 26 h in mammalian cells or slightly longer due to its tetrameric structure, in contrast with the monomeric configuration of other well-known fluorescent proteins (PMID: 17510373) 4. The rationale for using this GFP protein is that ZsGreen is an exceptionally bright green fluorescent protein, which is up to 4X brighter than EGFP—and is ideally suited for whole-cell labelling, promoter-reporter studies, considering of the higher turnover and rapid degradation of Fshr transcript. In this study, we used ZsGreen as a monitor or an indicator of the active Fshr endogenous promoter, rather than a means for measuring the promoter activity. Therefore, regardless of its accumulation or not, ZsGreen driven by Fshr promoter, indicates the presence of active Fshr promoter in the defined cells. In stead, we used ddRT-PCR to measure Fshr expression degrees in this study. In addition, we also provide single cell sequence-based evidence from 4 public single cell portables to support our findings of the wide Fshr expression. Please see Supplementary Data 3 in the revised MS.

      References:

      (1) Su J, Song Y, Yang Y, et al. Study on the changes of LHR, FSHR and AR with the development of testis cells in Hu sheep. Anim Reprod Sci. Sep 2023;256:107306. doi:10.1016/j.anireprosci.2023.107306

      (2) Liu P, Ji Y, Yuen T, et al. Blocking FSH induces thermogenic adipose tissue and reduces body fat. Nature. Jun 1 2017;546(7656):107-112. doi:10.1038/nature22342

      (3) Liu XM, Chan HC, Ding GL, et al. FSH regulates fat accumulation and redistribution in aging through the Galphai/Ca(2+)/CREB pathway. Aging Cell. Jun 2015;14(3):409-20. doi:10.1111/acel.12331

      (4) Bell P, Vandenberghe LH, Wu D, Johnston J, Limberis M, Wilson JM. A comparative analysis of novel fluorescent proteins as reporters for gene transfer studies. J Histochem Cytochem. Sep 2007;55(9):931-9. doi:10.1369/jhc.7A7180.2007

    1. Author response:

      eLife assessment 

      This important study identifies a novel gastrointestinal enhancer of Ctnnb1. The authors present convincing evidence to support their claim that the dosage of Wnt/β-catenin signaling controlled by this enhancer is critical to intestinal epithelia homeostasis and the progression of colorectal cancers. The study will be of interest to biomedical researchers interested in Wnt signaling, tissue-specific enhancers, intestinal homeostasis, and colon cancer. 

      We greatly appreciate editors’ and reviewers’ extensive and constructive comments and suggestions. We will do our utmost to revise the manuscript accordingly.

      Public Reviews: 

      Reviewer #1 (Public Review)

      Summary: 

      Ctnnb1 encodes β-catenin, an essential component of the canonical Wnt signaling pathway. In this study, the authors identify an upstream enhancer of Ctnnb1 responsible for the specific expression level of β-catenin in the gastrointestinal tract. Deletion of this promoter in mice and analyses of its association with human colorectal tumors support that it controls the dosage of Wnt signaling critical to the homeostasis in intestinal epithelia and colorectal cancers. 

      Strengths: 

      This study has provided convincing evidence to demonstrate the functions of a gastrointestinal enhancer of Ctnnb1 using combined approaches of bioinformatics, genomics, in vitro cell culture models, mouse genetics, and human genetics. The results support the idea that the dosage of Wnt/β-catenin signaling plays an important role in the pathophysiological functions of intestinal epithelia. The experimental designs are solid and the data presented are of high quality. This study significantly contributes to the research fields of Wnt signaling, tissue-specific enhancers, and intestinal homeostasis. 

      Weaknesses: 

      One weakness of this manuscript is an insufficient discussion on the Ctnnb1 enhancers for different tissues. For example, do specific DNA motifs and transcriptional factors contribute to the tissue-specificity of the neocortical and gastrointestinal enhancers? It is also worth discussing the potential molecular mechanisms controlling the gastrointestinal expression of Ctnnb1 in different species since the identified human and mouse enhancers don't seem to share significant similarities in primary sequences. 

      We agree with the reviewer that the manuscript lacks sufficient discussions on how enhancers control cell-type-specific expressions of target genes, which is one of the most important questions in the field of transcription regulation. Equally important are the common and species-specific features of this regulation. In general, motif composition, location, order, and affinity with trans-factors within enhancers are four key elements. We will elaborate the point in follow-up revision.

      Reviewer #2 (Public Review): 

      Wnt signaling is the name given to a cell-communication mechanism that cells employ to inform on each other's position and identity during development. In cells that receive the Wnt signal from the extracellular environment, intracellular changes are triggered that cause the stabilization and nuclear translocation of β-catenin, a protein that can turn on groups of genes referred to as Wnt targets. Typically these are genes involved in cell proliferation. Genetic mutations that affect Wnt signaling components can therefore affect tissue expansion. Loss of function of APC is a drastic example: APC is part of the β-catenin destruction complex, and in its absence, β-catenin protein is not degraded and constitutively turns on proliferation genes, causing cancers in the colon and rectum. And here lies the importance of the finding: β-catenin has for long been considered to be regulated almost exclusively by tuning its protein turnover. In this article, a new aspect is revealed: Ctnnb1, the gene encoding for β-catenin, possesses tissue-specific regulation with transcriptional enhancers in its vicinity that drive its upregulation in intestinal stem cells. The observation that there is more active β-catenin in colorectal tumors not only because the broken APC cannot degrade it, but also because transcription of the Ctnnb1 gene occurs at higher rates, is novel and potentially game-changing. As genomic regulatory regions can be targeted, one could now envision that mutational approaches aimed at dampening Ctnnb1 transcription could be a viable additional strategy to treat Wnt-driven tumors. 

      We appreciate the reviewer for acknowledging the potential significance represented by the manuscript. We also recognize that targeting genomic regulatory regions to dampen Ctnnb1 transcription could be a promising strategy for treating Wnt-driven tumors, including many colorectal carcinomas. However, we would like to point out that three are significant technical challenges associated with AAV delivery to the GI epithelium, including the hostile environment, immune response, and low delivery efficiency.

      Reviewer #3 (Public Review): 

      The authors of this paper identify an enhancer upstream of the Ctnnb1 gene that selectively enhances expression in intestinal cells. This enhancer sequence drives expression of a reporter gene in the intestine and knockout of this enhancer attenuates Ctnnb1 expression in the intestine while protecting mice from intestinal cancers. The human counterpart of this enhancer sequence is functional and involved in tumorigenesis. Overall, this is an excellent example of how to fully characterize a cell-specific enhancer. The strength of the study is the thorough nature of the analysis and the relevance of the data to the development of intestinal tumors in both mice and humans. A minor weakness is that the loss of this enhancer does not completely compromise the expression of the Ctnnb1 gene in the intestine, suggesting that other elements are likely involved. Adding some discussion on that point would be helpful.

      We are quite encouraged by the reviewer’s positive comments. We agree with the reviewer that other cis-regulatory elements may be involved in the transcription of Ctnnb1 within the GI epithelium. It is also possible that the basal transcription of Ctnnb1 within the GI epithelium is relatively high, and that enhancers can only boost transcription within a certain range. We will discuss these possibilities in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript presents a machine-learning method to predict protein hotspot residues. The validation is incomplete, along with the misinterpretation of the results with other current methods like FTMap.

      We believe that validation is complete: The two most common techniques for testing and validating machine-learning methods are to split the dataset into either (1) a training set and a test set with a fixed ratio (e.g., 70% for training and 30% for testing) or (2) multiple subsets/folds; i.e., cross-validation. We did not employ a training set to train the model and a separate test set to evaluate its performance, as Reviewer 2 assumed. Instead, we employed cross-validation, as it helps reduce the variability in performance estimates compared to a single training/test split, and utilizes the entire dataset for training and testing, making efficient use of the limited data. Each fold was used once as a test set and the remaining folds as the training set - this process was repeated for each fold and the model's performance was measured using the F1 score. We had listed the mean validation F1 score in Table 1.

      We have clarified our comparison with FTMAP  - see reply to point 1 of reviewer 1 below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper describes a program developed to identify PPI-hot spots using the free protein structure and compares it to FTMap and SPOTONE, two webservers that they consider as competitive approaches to the problem. On the positive side, I appreciate the effort in providing a new webserver that can be tested by the community but have two major concerns as follows.

      (1) The comparison to the FTMap program is wrong. The authors misinterpret the article they refer to, i.e., Zerbe et al. "Relationship between hot spot residues and ligand binding hot spots in protein-protein interfaces" J. Chem. Inf. Model. 52, 2236-2244, (2012). FTMap identifies hot spots that bind small molecular ligands. The Zerbe et al. article shows that such hot spots tend to interact with hot spot residues on the partner protein in a protein-protein complex (emphasis on "partner"). Thus, the hot spots identified by FTMap are not the hot spots defined by the authors. In fact, because the Zerbe paper considers the partner protein in a complex, the results cannot be compared to the results of Chen et al. This difference is missed by the authors, and hence the comparison of the FTMap is invalid. I did not investigate the comparison to SPOTONE, and hence have no opinion.

      Brenke et al. (Bioinformatics 2009 25: 621-627), who developed FTMAP, defined hot spots as regions of the binding surface that “contribute a disproportionate amount to the binding free energy”. Kozakov et al. (Proc. Natl. Acad. Sci. 2011:108, 13528-1353) used unbound protein structures as input to FTMap to predict binding hot spots for protein-protein interactions (PPIs), which are defined as regions (so-called consensus sites) on a protein surface that bind multiple probe clusters − the main hot spot is the largest consensus site binding the largest number of probe clusters. 

      Zerbe et al. (J. Chem. Inf. Model. 2012:52, 2236) noted that a consensus “site is expected to be important in any interaction that involves that region of the target independent of any partner protein.” They showed that for hot spot residues found by Ala scanning not only overlapped with the probe ligands but also form consensus sites, as shown in Figure 4. They stated that “A residue can also be identified as a hot spot by alanine scanning if it contributes to creating such a favorable binding environment by being among the residues forming a consensus site on the protein to which it belongs.”

      To clarify the comparison with FTmap in the revised version, we have added the following sentence in the Abstract on p. 3:

      “We explored the possibility of detecting PPI-hot spots using (i) FTMap in the PPI mode, which identifies hot spots on protein-protein interfaces from the free protein structure, and (ii) the interface residues predicted by AlphaFold-Multimer.”

      We have added the following sentences in the Introduction section on p. 4:

      “We explored the possibility of detecting PPI-hot spots using the FTMap server in the PPI mode, which identifies hot spots on protein-protein interfaces from free protein structures.45 These hot spots are identified by consensus sites − regions that bind multiple probe clusters.42,45,59 Such regions are deemed to be important for any interaction involving that region of the target, independent of partner protein.42 PPIhot spots were identified as residues in van der Waals (vdW) contact with probe ligands within the largest consensus site containing the most probe clusters.”

      and in the Results section on p. 5:

      “Given the free protein structure, PPI-HotspotID and SPOTONE53 predict PPI-hot spots based on a probability threshold (> 0.5). FTMap, in the PPI mode, detects PPIhot spots as consensus sites/regions on the protein surface that bind multiple probe clusters.59 Residues in vdW contact with probe molecules within the largest consensus site were compared with PPI-hotspotID/SPOTONE predictions.”

      (2) Chen et al. use a number of usual features in a variety of simple machine-learning methods to identify hot spot residues. This approach has been used in the literature for more than a decade. Although the authors say that they were able to find only FTMap and SPOTONE as servers, there are dozens of papers that describe such a methodology. Some examples are given here: (Higa and Tozzi, 2009; Keskin, et al., 2005; Lise, et al., 2011; Tuncbag, et al., 2009; Xia, et al., 2010). There are certainly more papers. Thus, while I consider the web server as a potentially useful contribution, the paper does not provide a fundamentally novel approach.

      Our paper introduces several novel elements in our approach: 

      (1) Most PPI-hot spot prediction methods employ PPI-hotspots where mutations decrease protein binding free energy by > 2 kcal/mol (J. Chem. Inf. Model. 2022, 62, 1052). In contrast, our method incorporates not only PPI-hot spots with such binding free energy changes, but also those whose mutations have been curated in UniProtKB to significantly impair/disrupt PPIs. Because our method employs the largest collection of experimentally determined PPI-hot spots, it could uncover elusive PPI-hot spots not within binding interfaces, as well as potential PPI-hot spots for other protein partners (see point 3 below). 

      (2) Whereas most machine-learning methods for PPI-hot spot prediction focus on features derived from (i) primary sequences or (ii) protein-protein complexes, we introduce novel features such as per-residue free energy contributions derived from unbound protein structures. We further revealed the importance of one of our novel features, namely, the gas-phase energy of the target protein relative to its unfolded state and provided the physical basis for its importance. For example, PPI-hot spots can enhance favorable enthalpic contributions to the binding free energy through hydrogen bonds or van der Waals contacts across the protein’s interface. This makes them energetically unstable in the absence of the protein’s binding partner and solvent; hence providing a rationale for the importance of the gas-phase energy of the target protein relative to its unfolded state.

      (3) As a result of these novel elements, our approach, PPI-HotspotID,  could identify many true positives that were not detected by FTMap or SPOTONE (see Results and Figure 1). Previous methods generally predict residues that make multiple contacts across the proteinprotein interface as PPI-hot spots. In contrast, PPI-HotspotID can detect not only PPI-hot spots that make multiple contacts across the protein-protein interface, but also those lacking direct contact with the partner protein (see Discussion).

      (4) Unlike most machine-learning methods which require feature customization, data preprocessing, and model optimization, our use of AutoGluon’s AutoTabular module automates data preprocessing, model selection, hyperparameter optimization, and model evaluation. This automation reduces the need for manual intervention.

      We have revised and added the following sentences on p. 9 in the Discussion section to highlight the novelty of our approach: 

      “Here, we have introduced two novel elements that have helped to identify PPI-hot spots using the unbound structure. First, we have constructed a dataset comprising 414 experimentally known PPI-hot spots and 504 nonhot spots, and carefully checked that PPI-hot spots have no mutations resulting in ΔΔGbind < 0.5 kcal/mol, whereas nonhot spots have no mutations resulting in ΔΔGbind ≥ 0.5 kcal/mol or impact binding in immunoprecipitation or GST pull-down assays (see Methods). In contrast, SPOTONE53 employed nonhot spots defined as residues that upon alanine mutation resulted in ΔΔGbind < 2.0 kcal/mol. Notably, previous PPI-hot spot prediction methods did not employ PPIhot spots whose mutations have been curated to significantly impair/disrupt PPIs in UniProtKB (see Introduction). Second, we have introduced novel features derived from unbound protein structures such as the gas-phase energy of the target protein relative to its unfolded state.”

      Strengths:

      A new web server was developed for detecting protein-protein interaction hot spots.

      Weaknesses:

      The comparison to FTMap results is wrong. The method is not novel.

      See reply to points 1 and 2 above.

      Reviewer #2 (Public Review):

      Summary:

      The paper presents PPI-hotspot a method to predict PPI-hotspots. Overall, it could be useful but serious concerns about the validation and benchmarking of the methodology make it difficult to predict its reliability.

      Strengths:

      Develops an extended benchmark of hot-spots.

      Weaknesses:

      (1) Novelty seems to be just in the extended training set. Features and approaches have been used before.

      The novelty of our approach extends beyond just the expanded training set, as summarized in our reply to Reviewer #1, point 2 above. To our knowledge, previous studies did not leverage the gas-phase energy of the target protein relative to its unfolded state for detecting PPI-hot spots from unbound structures. Previous studies did not automate the training and validation process. In contrast, we used AutoGluon’s AutoTabular module to automate the training  of (i) individual “base” models, including LightGBM, CatBoost, XGBoost, random forests, extremely randomized trees, neural networks, and K-nearest neighbours, then (ii) multiple “stacker” models. The predictions of multiple “stacker” models were fed as inputs to additional higher layer stacker models in an iterative process called multi-layer stacking. The output layer used ensemble selection to aggregate the predictions of the stacker models. To improve stacking performance, AutoGluon used all the data for both training and validation through repeated k-fold bagging of all models at all layers of the stack, where k is determined by best precision. This comprehensive approach, including repeated k-fold bagging of all models at all layers of the stack, sets our methodology apart from previous studies, including SPOTONE (see Methods). 

      (2) As far as I can tell the training and testing sets are the same. If I am correct, it is a fatal flaw.

      The two most common techniques for testing and validating machine-learning methods are to split the dataset into either (1) a training set and a test set with a fixed ratio (e.g., 70% for training and 30% for testing) or (2) multiple subsets/folds; i.e., cross-validation. We did not employ a training set to train the model and a separate test set to evaluate its performance. Instead, we employed cross-validation, where the model was trained and evaluated multiple times. Each fold was used once as a test set and the remaining folds serve as the training set - this process was repeated for each fold. For each test set, we assessed  the model's performance using the F1 score. We had listed the mean validation F1 score in Table 1 in the original manuscript. Cross-validation helps reduce the variability in performance estimates compared to a single training/test split. It also utilizes the entire dataset for training and testing, making efficient use of the limited data. We have clarified this on p. 14 in the revised version:

      “AutoGluon was chosen for model training and validation due to its robustness and userfriendly interface, allowing for the simultaneous and automated exploration of various machine-learning approaches and their combinations. Instead of using a single training set to train the model and a separate test set to evaluate its performance, we employed cross-validation, as it utilizes the entire dataset for both training and testing, making efficient use of the limited data on PPI-hot spots and PPI-nonhot spots. AutoGluonTabular automatically chose a random partitioning of our dataset into multiple subsets/folds for training and validation. Notably, the training and validation data share insignificant homology, as the average pairwise sequence identity in our dataset is 26%. Each fold was used once as a test set, while the remaining folds served as the training set. For each test set, the model's performance was measured using the F1 score.”

      (3) Comparisons should state that: SPOTONE is a sequence (only) based ML method that uses similar features but is trained on a smaller dataset. FTmap I think predicts binding sites, I don't understand how it can be compared with hot spots. Suggesting superiority by comparing with these methods is an overreach.

      In the Introduction on page 3, we had already stated that:

      “SPOTONE53 predicts PPI-hot spots from the protein sequence using residue-specific features such as atom type, amino acid (aa) properties, secondary structure propensity, and mass-associated values to train an ensemble of extremely randomized trees. The PPIhot spot prediction methods have mostly been trained, validated, and tested on data from the Alanine Scanning Energetics database (ASEdb)55 and/or the Structural Kinetic and Energetic database of Mutant Protein Interactions (SKEMPI) 2.0 database.56”

      On p. 4, we have clarified how we used FTMAP to detect hot spots - see reply to Reviewer #1, point 1. 

      “We explored the possibility of detecting PPI-hot spots using the FTMap server in the PPI mode, which identifies hot spots on protein-protein interfaces from free protein structures.45 These hot spots are identified by consensus sites − regions that bind multiple probe clusters.42,45,59 Such regions are deemed to be important for any interaction involving that region of the target, independent of partner protein.42 PPI-hot spots were identified as residues in van der Waals (vdW) contact with probe ligands within the largest consensus site containing the most probe clusters.”

      (4) Training in the same dataset as SPOTONE, and then comparing results in targets without structure could be valuable.

      We think that the dataset used by SPOTONE is not as “clean” as ours since SPOTONE employed nonhot spots defined as aa residues that upon alanine mutation resulted in ΔΔGbind < 2.0 kcal/mol.  In contrast, we define nonhot spots as residues whose mutations resulted in protein  ΔΔGbind changes < 0.5 kcal/mol. Moreover, we carefully checked that the nonhot spots have no mutations resulting in ΔΔGbind changes ≥ 0.5 kcal/mol or impact binding in immunoprecipitation or GST pull-down assays (see Methods). We cannot compare results in targets without structure because we require the free protein structure to compute the perresidue free energy contributions. 

      (5) The paper presents as validation of the prediction and experimental validation of hotspots in human eEF2. Several predictions were made but only one was confirmed, what was the overall success rate of this exercise?

      We did not test all predicted PPI-hot spots but only the PPI-hot spot with the highest probability of 0.67 (F794) and 7 other predicted PPI-hot spots that were > 12 Å from F794 as well as 4 predicted PPI-nonhot spots. Among the 13 predictions tested, F794 and the 4 predicted nonhot spots were confirmed to be correct. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Remove the comparison to FTMap, and find a more appropriate reference method, even if it requires installing programs rather than using the available web servers.

      We have clarified comparison to FTMap in the revised ms - see our reply above.

    1. Author response:

      eLife assessment

      This useful study examines the neural activity in the motor cortex as a monkey reaches to intercept moving targets, focusing on how tuned single neurons contribute to an interesting overall population geometry. The presented results and analyses are solid, though the investigation of this novel task could be strengthened by clarifying the assumptions behind the single neuron analyses, and further analyses of the neural population activity and its relation to different features of behaviour.

      Thanks for recognizing the content of our research, and please stay tuned for our follow-up studies on neural dynamics during interception.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity. The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry for not providing sufficient grounds to introduce the subtypes. We determined the PD shift, gain, and addition as pertinent subtypes based on classical cosine tuning model (Georgopoulos et al., 1982) and referred to some gain modulation studies (e.g. Pesaran et al. 2010, Bremner and Andersen, 2012). Here, we applied this subtype analysis as a criteria to identify the modulation in neuronal population rather than to sort neuron into distinct cell types. We will update Methods in the revised version of manuscript.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      The mixed selectivity or precisely the mixed modulation is indeed a significant feature of neuronal population in the present study. The purpose of the subtype analysis was to serve as a criterion for the potential modulation mechanisms. However, the results appear to be a spectrum than clusters. It still through some insights to understand the modulation distribution and we will refine the description in the next version. In the current version, we observed single-unit tuning and population neural state with sliding windows, focusing on the period around movement onset (MO) due to the emergence of a ring-like structure. We will clarify the choice of windows and the dependence assessment in the next version. It’s a great suggestion to consider the role of rotating tuning curves in neural dynamics during interception.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We will introduce the relevant research in the next version of manuscript.

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      Great suggestion! However, it’s hard to implement as the implanted arrays have been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      Nice suggestion! The fitting goodness of the simple model (just motor direction) is much less than the complex model (including target speed). We will update the results in the next version.

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. We will have a try with unsupervised methods. 

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. We will test the decoder in other epochs.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In this study, we discrete the reach-direction as the previous studies (Li et al., 2018&2022) and thought that the discrete decoding was already enough to show the interaction of sensory and motor variables. In future studies, we will try continuous decoding of hand kinematics.

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We will perform decoding analysis on RNN units to verify if there is interaction of sensory and motor variables as in real data, as well as the canonical correlation or Procrustes analysis.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for recognizing our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a single-neuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      The present study shows the sensory modulation on motor tuning in single units and neural state during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working this topic, and hopefully will address related questions in our follow-up studies.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      Great idea! We are on the way, and close to complete the puzzle.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")--this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Nice suggestion. Target-speed modulation mainly influences PC3, which is consistent with ‘null space’ hypothesis. We will try other methods of dimensionality reduction (e.g. dPCA, Manopt) to determine the potent and null space.

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we didn’t clarify the definition of “none” type, which can be misleading. The 43% unclassified nodes include those inactive ones, when only activate (task-related) nodes included, the ratio of unclassified nodes would be much lower. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We tried moving neural states from one ring to another without changing the directional cluster, but this perturbation didn’t have a significant influence on network performance as expected. We will check this result again and try perturbations in the delay period.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the « neural population » resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thanks for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We will highlight the resampling results as important control in the next version of manuscript.

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. It’s a considerable pity that we didn’t dissect the formation reason and influence factor of the representation in the current version. We’ve tried several combinations of inputs before: in the network which received only motor intention and GO inputs, there were rings but not tilting related to target-speed; in the network which received only target location and GO inputs, there were ring-like structures but not clear directional clusters. We will check these results and try alternative models in the next version. In future studies, we will examine the influence of network setup details.

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for the great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We tried ablation experiments, but the result was not significant. Probably because the most units were of mixed selectivity, the units of only modulations were not enough for bootstrapping, or the random sampling from single subpopulation (bearing mixed selectivity) could be repeated. We will consider these suggestions carefully in the revised version.

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulates M1 are very interesting and open questions. We will discuss further about this topic in the next version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Semenova et al. have studied a large cross-sectional cohort of people living with HIV on suppressive ART, N=115, and performed high dimensional flow cytometry to then search for associations between immunological and clinical parameters and intact/total HIV DNA levels.

      A number of interesting data science/ML approaches were explored on the data and the project seems a serious undertaking. However, like many other studies that have looked for these kinds of associations, there was not a very strong signal. Of course, the goal of unsupervised learning is to find new hypotheses that aren't obvious to human eyes, but I felt in that context, there were (1) results slightly oversold, (2) some questions about methodology in terms mostly of reservoir levels, and (3) results were not sufficiently translated back into meaning in terms of clinical outcomes.

      We appreciate the reviewer’s perspective.  In our revised version of the manuscript, we have attempted to address these concerns by more adequately explaining the limitations of the study and by more thoroughly discussing the context of the findings.  We are not able to associate the findings with specific clinical outcomes for individual study participants but we speculate about the overall biological meaning of these associations across the cohort.  We cannot disagree with the reviewer, but we find the associations statistically significant, potentially reflecting real biological associations, and forming the basis for future hypothesis testing research. 

      Strengths:

      The study is evidently a large and impressive undertaking and combines many cutting-edge statistical techniques with a comprehensive experimental cohort of people living with HIV, notably inclusive of populations underrepresented in HIV science. A number of intriguing hypotheses are put forward that could be explored further. Sharing the data could create a useful repository for more specific analyses.

      We thank the reviewer for this assessment.

      Weaknesses:

      Despite the detailed experiments and methods, there was not a very strong signal for the variable(s) predicting HIV reservoir size. The Spearman coefficients are ~0.3, (somewhat weak, and acknowledged as such) and predictive models reach 70-80% prediction levels, though sometimes categorical variables are challenging to interpret.

      We agree with the reviewer that individual parameters are only weakly correlated with the HIV reservoir, likely reflecting the complex and multi-factorial nature of reservoir/immune cell interactions.  Nevertheless, these associations are statistically significant and form the basis for functional testing in viral persistence.

      There are some questions about methodology, as well as some conclusions that are not completely supported by results, or at minimum not sufficiently contextualized in terms of clinical significance.  On associations: the false discovery rate correction was set at 5%, but data appear underdetermined with fewer observations than variables (144vars > 115ppts), and it isn't always clear if/when variables are related (e.g inverses of one another, for instance, %CD4 and %CD8).

      When deriving a list of cell populations whose frequency would be correlated with the reservoir, we focused on well-defined cell types for which functional validation exists in the literature to consider them as distinct cell types.  For many of the populations, gating based on combinations of multiple markers leads to recovery of very few cells, and so we excluded some potential combinations from the analysis.  We are also making our raw data available for others to examine and find associations not considered by our manuscript.

      The modeling of reservoir size was unusual, typically intact and defective HIV DNA are analyzed on a log10 scale (both for decays and predicting rebound). Also, sometimes in this analysis levels are normalized (presumably to max/min?, e.g. S5), and given the large within-host variation of level we see in other works, it is not trivial to predict any downstream impact of normalization across population vs within-person.

      We have repeated the analysis using log10 transformed data and the new figures are shown in Figure 1 and S2-S5.

      Also, the qualitative characterization of low/high reservoir is not standard and naturally will split by early/later ART if done as above/below median. Given the continuous nature of these data, it seems throughout that predicting above/below median is a little hard to translate into clinical meaning.

      Our ML models included time before ART as a variable in the analysis, and this was not found to be a significant driver of the reservoir size associations, except for the percentage of intact proviruses (see Figure 2C). Furthermore, we analyzed whether any of the reservoir correlated immune variables were associated with time on ART and found that, although some immune variables are associated with time on therapy, this was not the case for most of them (Table S4). We agree that it is challenging to translate above or below median into clinical meaning for this cohort, but we emphasize that this study is primarily a hypothesis generating approach requiring additional validation for the associations observed.  We attempted to predict reservoir size as a continuous variable using the data and this approach was not successful (Figure S13). We believe that a significantly larger cohort will likely be required to generate a ML model that can accurately predict the reservoir as a continuous variable.  We have added additional discussion of this to the manuscript.

      Lastly, the work is comprehensive and appears solid, but the code was not shared to see how calculations were performed.

      We now provide a link to the code used to perform the analyses in the manuscript, https://github.com/lesiasemenova/ML_HIV_reservoir.

      Reviewer #2 (Public Review):

      Summary:

      Semenova et. al., performed a cross-sectional analysis of host immunophenotypes (using flow cytometry) and the peripheral CD4+ T cell HIV reservoir size (using the Intact Proviral DNA Assay, IPDA) from 115 people with HIV (PWH) on ART. The study mostly highlights the machine learning methods applied to these host and viral reservoir datasets but fails to interpret these complex analyses into (clinically, biologically) interpretable findings. For these reasons, the direct translational take-home message from this work is lost amidst a large list of findings (shown as clusters of associated markers) and sentences such as "this study highlights the utility of machine learning approaches to identify otherwise imperceptible global patterns" - lead to overinterpretation of their data.

      We have addressed the reviewer’s concern by modifications to the manuscript that enhance the interpretation of the findings in a clinical and biological context.

      Strengths:

      Measurement of host immunophenotyping measures (multiparameter flow cytometry) and peripheral HIV reservoir size (IPDA) from 115 PWH on ART.

      Major Weaknesses:

      (1) Overall, there is little to no interpretability of their machine learning analyses; findings appear as a "laundry list" of parameters with no interpretation of the estimated effect size and directionality of the observed associations. For example, Figure 2 might actually give an interpretation of each X increase in immunophenotyping parameter, we saw a Y increase/decrease in HIV reservoir measure.

      We have added additional text to the manuscript in which we attempt to provide more immunological and clinical interpretation of the associations.  We also have emphasized that these associations are still speculative and will require additional validation.  Nevertheless, our data should provide a rich source of new hypotheses regarding immune system/reservoir interaction that could be tested in future work.

      (2) The correlations all appear to be relatively weak, with most Spearman R in the 0.30 range or so.

      We agree with the review that the associations are mostly weak, consistent with previous studies in this area.  This likely is an inherent feature of the underlying biology – the reservoir is likely associated with the immune system in complex ways and involves stochastic processes that will limit the predictability of reservoir size using any single immune parameter. We have added additional text to the manuscript to make this point clearer.

      (3) The Discussion needs further work to help guide the reader. The sentence: "The correlative results from this present study corroborate many of these studies, and provide additional insights" is broad. The authors should spend some time here to clearly describe the prior literature (e.g., describe the strength and direction of the association observed in prior work linking PD-1 and HIV reservoir size, as well as specify which type of HIV reservoir measures were analyzed in these earlier studies, etc.) and how the current findings add to or are in contrast to those prior findings.

      We have added additional text to the manuscript to help guide the readers through the possible biological significance of the findings and the context with respect to prior literature.

      (4) The most interesting finding is buried on page 12 in the Discussion: "Uniquely, however, CD127 expression on CD4 T cells was significantly inversely associated with intact reservoir frequency." The authors should highlight this in the abstract, and title, and move this up in the Discussion. The paper describes a very high dimensional analysis and the key takeaways are not clear; the more the author can point the reader to the take-home points, the better their findings can have translatability to future follow-up mechanistic and/or validation studies.

      We appreciate the reviewer’s comment.  We have increased the emphasis on this finding in the revised version of the manuscript.

      (5) The authors should avoid overinterpretation of these results. For example in the Discussion on page 13 "The existence of two distinct clusters of PWH with different immune features and reservoir characteristics could have important implications for HIV cure strategies - these two groups may respond differently to a given approach, and cluster membership may need to be considered to optimize a given strategy." It is highly unlikely that future studies will be performing the breadth of parameters resulting here and then use these directly for optimizing therapy.

      Our analyses indicate that membership of study participants in cluster1 or cluster 2 can be fairly accurately determined by a small number of individual parameters (KLRG1 etc, Figure 4F), and measuring the cells of PWH with the degree of breadth used in this paper would not be necessary to classify PWH into these clusters.  As such, we feel that it is not unrealistic to speculate that this finding could turn out to be clinically useful, if it becomes clear that the clusters are biologically meaningful.

      (6) There are only TWO limitations listed here: cross-sectional study design and the use of peripheral blood samples. (The subsequent paragraph notes an additional weakness which is misclassification of intact sequences by IPDA). This is a very limited discussion and highlights the need to more critically evaluate their study for potential weaknesses.

      We have expanded on the list of limitations discussed in the manuscript. In particular, we now address the size of the cohort, the composition with respect to different genders and demographics, lack of information for the timing of ART and the lack of information regarding intracellular transcriptional pathways.

      (7) A major clinical predictor of HIV reservoir size and decay is the timing of ART initiation. The authors should include these (as well as other clinical covariate data - see #12 below) in their analyses and/or describe as limitations of their study.

      All of the participants that make up our cohort were treated during chronic infection, and the precise timing of ART initiation is unclear in most of these cases.  We have added additional information to explain this in the manuscript and include this in the list of limitations.

      Reviewer #3 (Public Review):

      Summary:

      This valuable study by Semenova and colleagues describes a large cross-sectional cohort of 115 individuals on ART. Participants contributed a single blood sample which underwent IPDA, and 25-color flow with various markers (pre and post-stimulation). The authors then used clustering, decision tree analyses, and machine learning to look for correlations between these immunophenotypic markers and several measures of HIV reservoir volume. They identified two distinct clusters that can be somewhat differentiated based on total HIV DNA level, intact HIV DNA level, and multiple T cell cellular markers of activation and exhaustion.

      The conclusions of the paper are supported by the data but the relationships between independent and dependent variables in the models are correlative with no mechanistic work to determine causality. It is unclear in most cases whether confounding variables could explain these correlations. If there is causality, then the data is not sufficient to infer directionality (ie does the immune environment impact the HIV reservoir or vice versa or both?). In addition, even with sophisticated and appropriate machine learning approaches, the models are not terribly predictive or highly correlated. For these reasons, the study is very much hypothesis-generating and will not impact cure strategies or HIV reservoir measurement strategies in the short term.

      We appreciate the reviewer’s comments regarding the value of our study.  We fully acknowledge that the causal nature and directionality of these associations are not yet clear and agree that the study is primarily hypothesis generating in nature.  Nevertheless, we feel that the hypotheses generated will be valuable to the field.  We have added additional text to the manuscript to emphasize the hypothesis generating nature of this paper.

      Strengths:

      The study cohort is large and diverse in terms of key input variables such as age, gender, and duration of ART. Selection of immune assays is appropriate. The authors used a wide array of bioinformatic approaches to examine correlations in the data. The paper was generally well-written and appropriately referenced.

      Weaknesses:

      (1) The major limitation of this work is that it is highly exploratory and not hypothesis-driven. While some interesting correlations are identified, these are clearly hypothesis-generating based on the observational study design.

      We agree that the major goal of this study was hypothesis generating and that our work is exploratory in nature. Performing experiments with mechanism testing goals in human participants with HIV is challenging.  Additionally, before such mechanistic studies can be undertaken, one must have hypotheses to test. As such we feel our study will be useful for the field in helping to identify hypotheses that could potentially be tested.

      (2) The study's cross-sectional nature limits the ability to make mechanistic inferences about reservoir persistence. For instance, it would be very interesting to know whether the reservoir cluster is a feature of an individual throughout ART, or whether this outcome is dynamic over time.

      We agree with the reviewer’s comment. Longitudinal studies are challenging to carry out with a study cohort of this size, and addressing questions such as the one raised by the reviewer would be of great interest. We believe our study nevertheless has value in identifying hypotheses that could be tested in a longitudinal study.

      (3) A fundamental issue is that I am concerned that binarizing the 3 reservoir metrics in a 50/50 fashion is for statistical convenience. First, by converting a continuous outcome into a simple binary outcome, the authors lose significant amounts of quantitative information. Second, the low and high reservoir outcomes are not actually demonstrated to be clinically meaningful: I presume that both contain many (?all) data points above levels where rebound would be expected soon after interruption of ART. Reservoir levels would also have no apparent outcome on the selection of cure approaches. Overall, dividing at the median seems biologically arbitrary to me.

      The reviewer raises a valid point that the clinical significance of above or below median reservoir metrics is unclear, and that the size of the reservoir has potentially little relation to rebound and cure approaches.  In the manuscript, we attempted to generate models that can predict reservoir size as a continuous variable in Figure S13 and find that this approach performs poorly, while a binarized approach was more successful. As such we have included both approaches in the manuscript.  It is possible that future studies with larger sample sizes and more detailed measurements will perform better for continuous variable prediction.  While this is a fairly large study (n=115) by the standards of HIV reservoir analyses, it is a small study by the standards of the machine learning field, and accurate predictive ML models for reservoir size as a continuous variable will likely require a much larger set of samples/participants.  Nevertheless, we feel our work has value as a template for ML approaches that may be informative for understanding HIV/immune interactions and generates novel hypotheses that could be validated by subsequent studies.

      (4) The two reservoir clusters are of potential interest as high total and intact with low % intact are discriminated somewhat by immune activation and exhaustion. This was the most interesting finding to me, but it is difficult to know whether this clustering is due to age, time on ART, other co-morbidity, ART adherence, or other possible unmeasured confounding variables.

      We agree that this finding is one of the more interesting outcomes of the study. We examined a number of these variables for association with cluster membership, and these data are reported in Figure S8A-D.  Age, years of ART and CD4 Nadir were all clearly different between the clusters.   The striking feature of this clustering, however, is the clear separation between the two groups of participants, as opposed to a continuous gradient of phenotypes.  This could reflect a bifurcation of outcomes for people with HIV, dynamic changes in the reservoir immune interactions over time, or different levels of untreated infection.  It is certainly possible that some other unmeasured confounding variables contribute to this outcome and we have attempted to make this limitation clearer.

      (5) At the individual level, there is substantial overlap between clusters according to total, intact, and % intact between the clusters. Therefore, the claim in the discussion that these 2 cluster phenotypes may require different therapeutic approaches seems rather speculative. That said, the discussion is very thoughtful about how these 2 clusters may develop with consideration of the initial insult of untreated infection and / or differences in immune recovery.

      We agree with the reviewer that this claim is speculative, and we have attempted to moderate the language of the text in the revised version.

      (6) The authors state that the machine learning algorithms allow for reasonable prediction of reservoir volume. It is subjective, but to me, 70% accuracy is very low. This is not a disappointing finding per se. The authors did their best with the available data. It is informative that the machine learning algorithms cannot reliably discriminate reservoir volume despite substantial amounts of input data. This implies that either key explanatory variables were not included in the models (such as viral genotype, host immune phenotype, and comorbidities) or that the outcome for testing the models is not meaningful (which may be possible with an arbitrary 50/50 split in the data relative to median HIV DNA volumes: see above).

      We acknowledge that the predictive power of the models generated from these data is modest and we have clarified this point in the revised manuscript. As the reviewer indicates, this may result from the influence of unmeasured variables and possible stochastic processes.  The data may thus demonstrate a limit to the predictability of reservoir size which may be inherent to the underlying biology.  As we mention above, this study size (n-115) is fairly small for the application of ML methods, and an increased sample size will likely improve the accuracy of the models. At this stage, the models we describe are not yet useful as predictive clinical tools, but are still nonetheless useful as tools to describe the structure of the data and identify reservoir associated immune cell types.

      (7) The decision tree is innovative and a useful addition, but does not provide enough discriminatory information to imply causality, mechanism, or directionality in terms of whether the immune phenotype is impacting the reservoir or vice versa or both. Tree accuracy of 80% is marginal for a decision tool.

      The reviewer is correct about these points.  In the revised manuscript, we have attempted to make it clear that we are not yet advocating using this approach as a decision tool, but simply a way to visualize the data and understand the structure of the dataset.  As we discuss above, the models will likely need to be trained on a larger dataset and achieve higher accuracy before use as a decision tool.

      (8) Figure 2: this is not a weakness of the analysis but I have a question about interpretation. If total HIV DNA is more predictive of immune phenotype than intact HIV DNA, does this potentially implicate a prior high burden of viral replication (high viral load &/or more prolonged time off ART) rather than ongoing reservoir stimulation as a contributor to immune phenotype? A similar thought could be applied to the fact that clustering could only be detected when applied to total HIV DNA-associated features. Many investigators do not consider defective HIV DNA to be "part of the reservoir" so it is interesting to speculate why these defective viruses appear to have more correlation with immunophenotype than intact viruses.

      We agree with the reviewer that this observation could reflect prior viral burden and we have added additional text to make this clearer.  Even so, we cannot rule out a model in which defective viral DNA is engaged in ongoing stimulation of the immune system during ART, leading to the stronger association between total DNA and the immune cell phenotypes. We hypothesize that the defective proviruses could potentially be triggering innate immune pattern recognition receptors via viral RNA or DNA, and a higher burden of the total reservoir leads to a stronger apparent association with the immune phenotype.  We have included text in the discussion about this hypothesis.

      (9) Overall, the authors need to do an even more careful job of emphasizing that these are all just correlations. For instance, HIV DNA cannot be proven to have a causal effect on the immunophenotype of the host with this study design. Similarly, immunophenotype may be affecting HIV DNA or the correlations between the two variables could be entirely due to a separate confounding variable

      We have revised the text of the manuscript to emphasize this point, and we acknowledge that any causal relationships are, at this point, simply speculation. 

      (10) In general, in the intro, when the authors refer to the immune system, they do not consistently differentiate whether they are referring to the anti-HIV immune response, the reservoir itself, or both. More specifically, the sentence in the introduction listing various causes of immune activation should have citations. (To my knowledge, there is no study to date that definitively links proviral expression from reservoir cells in vivo to immune activation as it is next to impossible to remove the confounding possible imprint of previous HIV replication.) Similarly, it is worth mentioning that the depletion of intact proviruses is quite slow such that provial expression can only be stimulating the immune system at a low level. Similarly, the statement "Viral protein expression during therapy likely maintains antigen-specific cells of the adaptive immune system" seems hard to dissociate from the persistence of immune cells that were reactive to viremia.

      We updated the text of the manuscript to address these points and have added additional citations as per the reviewer’s suggestion.

      (11) Given the many limitations of the study design and the inability of the models to discriminate reservoir volume and phenotype, the limitations section of the discussion seems rather brief.

      We have now expanded the limitations section of the discussion and added additional considerations. We now include a discussion of the study cohort size, composition and the detail provided by the assays.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A few specific comments:

      "This pattern is likely indicative of a more profound association of total HIV DNA with host immunophenotype relative to intact HIV DNA."

      Most studies I have seen (e.g. single cell from Lictherfeld/Yu group) show intact proviruses are generally more activated/detectable/susceptible to immune selection, so I have a hard time thinking defective proviruses are actually more affected by immunotype.

      We hypothesize that this association is actually occurring in the opposite direction – that the defective provirus are having a greater impact on the immune phenotype, due to their greater number and potential ability to engage innate or adaptive immune receptors. We have clarified this point in the manuscript

      "The existence of two distinct clusters of PWH with different immune features and reservoir characteristics could have important implications for HIV cure strategies - these two groups may respond differently to a given approach, and cluster membership may need to be considered to optimize a given strategy."

      I find this a bit of a reach, given that the definition of 2 categories depended on the total size.

      We have modified the language of this section to reduce the level of speculation.

      "This study is cross-sectional in nature and is primarily observational, so caution should be used interpreting findings associated with time on therapy".

      I found this an interesting statement because ultimately time on ART shows up throughout the analysis as a significant predictor, do you mean something about how time on ART could indicate other confounding variables like ART regimen or something?

      We have rephrased this comment to avoid confusion.  We were simply trying to make the point that we should avoid speculating about longitudinal dynamics from cross sectional data.

      "As expected, the plots showed no significant correlation for intact HIV DNA versus years of ART (Figure 1B), while total reservoir size was positively correlated with the time of ART (Figure 1A, Spearman r = 0.31)."<br />  Is this expected? Studies with longitudinal data almost uniformly show intact decay, at least for the first 10 or so years of ART, and defective/total stability (or slight decay). Also probably "time on ART" to not confuse with the duration of infection before ART.

      We have updated the language of this section to address this comment.  We have avoided comparing our data with respect to time on ART to longitudinal studies for reasons given above.

      On dimensionality reduction, as this PaCMAP seems a relatively new technique (vs tSNE and UMAP which are more standard, but absolutely have their weaknesses), it does seem important to contextualize. I think it would still be useful to show PCA and asses the % variance of each additional dimension to assess the effective dimensionality, it would be helpful to show a plot of % variance by # components to see if there is a cutoff somewhere, and if PaCMAP is really picking this up to determine the 2 dimensions/2 clusters is ideal. Figure 4B ultimately shows a lot of low/high across those clusters, and since low/high is defined categorically it's hard to know which of those dots are very close to the other categories.

      We have added this analysis to the manuscript – found in Figure S9. The PCA plot indicates that members of the two clusters also separate on PCA although this separation is not as clear as for the PaCMAP plot.

      Minor comments on writing etc:

      Intro

      -Needs some references on immune activation sequelae paragraph.

      We have added some additional references to this section.

      -"promote the entry of recently infected cells into the reservoir" -- that is only one possible mechanistic explanation, it's not unreasonable but it seems important to keep options open until we have more precise data that can illuminate the mechanism of the overabundance.

      We have modified the text to discuss additional hypotheses.

      -You might also reference Pankau et al Ppath for viral seeding near the time of ART.

      We have added this reference.

      -"Viral protein expression during therapy likely maintains antigen-specific cells of the adaptive immune system" - this was unclear to me, do you mean HIV-specific cells that act against HIV during ART? I think most studies show immunity against HIV (CD8 and CD4) wanes over time during ART.

      The Goonetilleke lab has recently generated data indicating that antiviral T cell responses are remarkably stable over time on ART, but we agree with the reviewer that the idea that ongoing antigen expression in the reservoir maintains these cells is speculative.  We have modified the text to make this point clearer.

      -Overall I think the introduction lacked a little bit of definitional precision: i.e. is the reservoir intact vs replication competent vs all HIV DNA and whether we are talking about PWH on long-term ART and how long we should be imagining? The first years of ART are certainly different than later, in terms of dynamics. The ultimate implications are likely specific for some of these categorizations.

      -"persistent sequelae of the massive disruptions to T cell homeostasis and lymphoid structures that occur during untreated HIV infection" needs a lot more context/referencing. For instance, Peter Hunt showed a decrease in activation after ART a long time ago.

      -Heather Best et al show T cell clonality stays perturbed after ART.

      We have updated the text of the introduction and added references to address the reviewer’s comments.

      Results

      -It would be important to mention the race of participants and any information about expected clades of acquired viruses, this gets mentioned eventually with reference to the Table but the breakdown would be helpful right away.

      We have added this information to the results section.

      -"performed Spearman correlations", may be calculated or tested?

      We have corrected the language for this sentence.

      Comments on figures:

      -Figure 1 data on linear scale (re discussion above) -- hard to even tell if there is a decay (to match with all we know from various long-term ART studies).

      -Figure 4 data is shown on ln (log_e) scale, which is hard to interpret for most people.

      -Figures 4 C,D, and E should have box plots to visually assess the significance.

      -Figure 4B legend says purple/pink but I think the colors are different in the plot, could be about transparency

      -Figure 5 it is now not clear if log_e(?).

      -Figure 6 "HIV reservoir characteristics" might be better to make this more explicit. Do you mean for instance in the 6B title Total HIV DNA per million CD4+ T cells I think?

      We have made these modifications.

      Reviewer #2 (Recommendations For The Authors):

      Minor Weaknesses:

      (1) The Introduction is too long and much of the text is not directly related to the study's research question and design.

      We have streamlined the introduction in the revised manuscript.

      (2) While no differences were seen by age or race, according to the authors, this is unlikely to be useful since the numbers are so small in some of these subcategories. Results from sensitivity analyses (e.g., excluding these individuals) may be more informative/useful.

      We agree that the lower numbers of participants for some subgroupings makes it challenging to know for sure if there are any differences based on these variables.  Have added text to clarify this. We have added age, race and gender to the LOCO analysis and to the variable inflation importance analysis (Table S5).

      (3) For Figure 4, based on what was described in the Results section of the manuscript, the authors should clarify that the figures show results for TOTAL HIV DNA only (not intact DNA): "Dimension reduction machine learning approaches identified two robust clusters of PWH when using total HIV DNA reservoir-associated immune cell frequencies (Figure 4A), but not for intact or percentage intact HIV DNA (Figure 4B and 4C)".

      We have added this information.

      (4) The statement on page 5, first paragraph, "Interestingly, when we examined a plot of percent intact proviruses versus time on therapy (Figure 1C), we observed a biphasic decay pattern," is not new (Peluso JCI Insight 2020, Gandhi JID 2023, McMyn JCI 2023). Prior studies have clearly demonstrated this biphasic pattern and should be cited here, and the sentence should be reworded with something like "consistent with prior work", etc.

      We have added citations to these studies and rephrased this comment.

      (5) The Cohort and sample collection sections are somewhat thin. Further details on the cohort details should include at the very minimum some description of the timing of ART initiation (is this mostly a chronic-treated cohort?) and important covariate data such as nadir CD4+ T cell count, pre-ART viral load, duration of ART suppression, etc.

      The cohort was treated during chronic infection, and we have clarified this in the manuscript.  Information regarding CD4 nadir and years on ART are included in Table 1.  Unfortunately, pre-ART viral load was not available for most members of this cohort, so we did not use it for analyses. The partial pre-ART viral load data is included with the dataset we are making publicly available.

      Reviewer #3 (Recommendations For The Authors):

      Minor points:

      (1) What is meant by CD4 nadir? Is this during primary infection or the time before ART initiation?

      We have clarified this description in the manuscript.  This term refers to the lowest CD4 count recorded during untreated infection.

      (2) The authors claim that determinants of reservoir size are starting to emerge but other than the timing of ART, I am not sure what studies they are referring to.

      We have updated the language of this section.  We intended to refer to studies looking at correlates of reservoir size, and feel that this is a more appropriate term that ‘determinants’

      (3) The discussion does not tie in the model-generated hypotheses with the known mechanisms that sustain the reservoir: clonal proliferation balanced by death and subset differentiation. It would be interesting to tie in the proposed reservoir clusters with these known mechanisms.

      We have added additional text to the manuscript to address these mechanisms.

      (4) Figure 1: Total should be listed as total HIV DNA.

      We have updated this in the manuscript.

      (5) Figure 1C: Worth mentioning the paper by Reeves et al which raises the possibility that the flattening of intact HIV DNA at 9 years may be spurious due to small levels of misclassification of defective as intact.

      We have added this reference.

      (6) "Total reservoir frequency" should be "total HIV DNA concentration"

      We respectfully feel that “frequency” is a more accurate term than “concentration”, since we are expressing the reservoir as a fraction of the CD4 T cells, while “concentration” suggests a denominator of volume.

      (7) Figure S2-5: label y-axis total HIV DNA.

      We have updated this figure.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Rtf1 HMD domain facilitates global histone H2B monoubiquitination and regulates morphogenesis and virulence in the meningitis-causing pathogen Cryptococcus neoformans" by Jiang et al., the authors employ a combination of molecular genetics and biochemical approaches, along with phenotypic evaluations and animal models, to identify the conserved subunit of the Paf1 complex (Paf1C), Rtf1, and functionally characterize its critical roles in mediating H2B monoubiquitination (H2Bub1) and the consequent regulation of gene expression, fungal development, and virulence traits in C. deneoformans or C. neoformans. Specially, the authors found that the histone modification domain (HMD) of Rtf1 is sufficient to promote H2B monoubiquitination (H2Bub1) and the expression of genes related to fungal mating and filamentation, and restores the fungal morphogenesis and pathogenicity defects caused by RTF1 deletion.

      Strengths:

      The manuscript is well-written and presents the findings in a clear manner. The findings are interesting and contribute to a better understanding of Rtf1-mediated epigenetic regulation of fungal morphogenesis and pathogenicity in a major human fungal pathogen, and potentially in other fungal species, as well.

      Weaknesses:

      A major limitation of this study is the absence of genome-wide information on Rtf1-mediated H2B monoubiquitination (H2Bub1), as well as a lack of detail regarding the function of the Plus3 domain. Although overexpression of HMD in the rtf1Δ mutant restored global H2Bub1 levels, it did not rescue certain critical biological functions, such as growth at 39 °C and melanin production (Figure 4C-D). This suggests that the precise positioning of H2Bub1 is essential for Rtf1's function. A comprehensive epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 would elucidate potential mechanisms and shed light on the function of the Plus3 domain.

      We thank the reviewer (and other reviewers) for this excellent suggestion. We have planned to carry out CUT&Tag assay to gain a comprehensive epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 under conditions, where overexpression of HMD failed to rescue the phenotypes in the _rtf1_Δ mutant, such as growth at 39 °C.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to determine the role of Rtf1 in Cryptococcal biology, and demonstrate that Rtf1 acts independently of the Paf1 complex to exert regulation of Histone H2B monoubiquitylation (H2Bub1). The biological impact of the loss of H2Bub1 was observed in defects in morphogenesis, reduced production of virulence factors, and reduced pathogenic potential in animal models of cryptococcal infection.

      Strengths:

      The molecular data is quite compelling, demonstrating that the Rtf1-depednent functions require only this histone modifying domain of Rtf1, and are dependent on nuclear localization. A specific point mutation in a residue conserved with the Rtf1 protein in the model yeast demonstrates the conservation of that residue in H2Bub1 modification. Interestingly, whereas expression of the HMD alone suppressed the virulence defect of the rtf1 deletion mutant, it did not suppress defects in virulence factor production.

      Weaknesses:

      The authors use two different species of Cryptococcus to investigate the biological effect of Rtf1 deletion. The work on morphogenesis utilized C. deneoformans, which is well-known to be a robust mating strain. The virulence work was performed in the C. neoformans H99 background, which is a highly pathogenic isolate. The study would be more complete if each of these processes were assessed in the other strain to understand if these biological effects are conserved across the two species of Cryptococcus. H99 is not as robust in morphogenesis, but reproducible results assessing mating and filamentation in this strain have been performed. Similarly, C. deneoformans does produce capsule and melanin.

      This is a fair point raised by the reviewer, and we are going to test whether these biological effects are conserved across the two species. We will access effects of RTF1 deletion on bisexual mating hyphal formation in C. neoformans H99 background and capsule and melanin productions in C. deneoformans XL280 background.

      There are some concerns with the conclusions related to capsule induction. The images reported in Figure B are purported to be grown under capsule-inducing conditions, yet the H99 panel is not representative of the induced capsule for this strain. Given the lack of a baseline of induction, it is difficult to determine if any of the strains may be defective in capsule induction. Quantification of a population of cells with replicates will also help to visualize the capsular diversity in each strain population.

      We thank the reviewer for raising this concern. We are going to confirm the conclusions related to capsule induction under multiple capsule-inducing conditions, including Dulbecco’s Modified Eagle’s Medium (DMEM), Littman’s medium, and 10% fetal bovine serum (FBS) agar medium [1].

      The authors demonstrate that for specific mating-related genes, the expression of the HMD recapitulated the wild-type expression pattern. The RNA-seq experiments were performed under mating conditions, suggesting specificity under this condition. The authors raise the point in the discussion that there may be differences in Rtf1 deposition on chromatin in H99, and under conditions of pathogenesis. The data that overexpression of HMD restores H2Bub1 by western is quite compelling, but does not address at which promoters H2Bub1 is modulating expression under pathogenesis conditions, and when full-length Rtf1 is present vs. only the HMD.

      We thank the reviewer for raising these concerns. As mentioned in the response to Reviewer 1, our CUT&Tag assay will provide evidence to address these questions.

      Reviewer #3 (Public Review):

      Summary:

      In this very comprehensive study, the authors examine the effects of deletion and mutation of the Paf1C protein Rtf1 gene on chromatin structure, filamentation, and virulence in Cryptococcus.

      Strengths:

      The experiments are well presented and the interpretation of the data is convincing.

      Weaknesses:

      Yet, one can be frustrated by the lack of experiments that attempt to directly correlate the change in chromatin structure with the expression of a particular gene and the observed phenotype. For example, the authors observed a strong defect in the expression of ZNF2, a known regulator of filamentation, mating, and virulence, in the rtf1 mutant. Can this defect explain the observed phenotypes associated with the RTF1 mutation? Is the observed defect in melanin production associated with altered expression of laccase genes and altered chromatin structure at this locus?

      We completely agree with the reviewer, and as mentioned in our response to Reviewer 1 and 2, we are going to conduct CUT&Tag assay to investigate the genetic relationship between Rtf1-mediated H2Bub1 and the expression of particular genes.

      (1) Jang, E.-H., et al., Unraveling Capsule Biosynthesis and Signaling Networks in Cryptococcus neoformans. Microbiology Spectrum, 2022. 10(6): p. e02866-22.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Building upon their famous tool for the deconvolution of human transcriptomics data (EPIC), Gabriel et al. implemented a new methodology for the quantification of the cellular composition of samples profiled with Assay for Transposase-Accessible Chromatin sequencing (ATAC-Seq). To build a signature for ATAC-seq deconvolution, they first created a compendium of ATAC-seq data and derived chromatin accessibility marker peaks and reference profiles for 21 cell types, encompassing immune cells, endothelial cells, and fibroblasts. They then coupled this novel signature with the EPIC deconvolution framework based on constrained least-square regression to derive a dedicated tool called EPIC-ATAC. The method was then assessed using real and pseudo-bulk RNA-seq data from human peripheral blood mononuclear cells (PBMC) and, finally, applied to ATAC-seq data from breast cancer tumors to show it accurately quantifies their immune contexture.

      Strengths:

      Overall, the work is of very high quality. The proposed tool is timely; its implementation, characterization, and validation are based on rigorous methodologies and resulted in robust results. The newly-generated, validation data and the code are publicly available and well-documented. Therefore, I believe this work and the associated resources will greatly benefit the scientific community.

      Weaknesses:

      CA few aspects can be improved to clarify the value and applicability of the EPIC-ATAC and the transparency of the benchmarking analysis.

      (1) Most of the validation results in the main text assess the methods on all cell types together, by showing the correlation, RMSE, and scatterplots of the estimated vs. true cell fractions. This approach is valuable for showing the overall method performance and for detecting systematic biases and noisy estimates. However, it provides very limited insights regarding the capability of the methods to estimate the individual cell types, which is the ultimate aim of deconvolution analysis. This limitation is exacerbated for rare cell types, which could even have a negative correlation with the ground truth fractions, but not weigh much on the overall RMSE and correlation. I would suggest integrating into the main text and figures an in-depth assessment of the individual cell types. In particular, it should be shown and discussed which cell types can be accurately quantified and which ones are less reliable.

      We thank the reviewer for raising this important point. Discussing the accuracy of EPIC-ATAC in predicting individual cell-type proportions would indeed be valuable in the main text. We have updated the text as follows.

      In the first version of our manuscript, we had a section called “T cell subtypes quantification reveals the ATAC-Seq deconvolution limits for closely related cell types” which highlighted that EPIC-ATAC shows low performances when predicting the proportions of cell types that are closely related, e.g., CD4+ T cell or CD8+ T cell subtypes. The section is now named “Accuracy of ATAC-Seq deconvolution is determined by the abundance and specificity of each cell type” and has been expanded to discuss the accuracy of EPIC-ATAC predictions within each major cell type.

      To do so, we represented in Figure 5A the performances of EPIC-ATAC in each cell type present in the benchmarking datasets from Figures 3 and 4. Additionally, we have kept in the supplementary figures the details of the correlation values and RMSE values within each cell type and for each tool (Supplementary Figures 9 and 10). The following text has been added in the main text to describe these analyses:

      “Accuracy of ATAC-Seq deconvolution is determined by the abundance and specificity of each cell type

      To investigate the impact of cell type abundance on the accuracy of ATAC-Seq deconvolution, we evaluated EPIC-ATAC predictions in each major cell type separately in the different benchmarking datasets (Figure 5A). NK cells, endothelial cells, neutrophils or dendritic cells showed lower correlation values. These values can be explained by the fact that these cell types are low-abundant in our benchmarking datasets (Figure 5A). For the endothelial cells and dendritic cells, the RMSE values associated to these cell types remain low. This suggests that while the predictions of EPIC-ATAC might not be precise enough to compare these cell-type proportions between different samples, the cell-type quantification within each sample is reliable. For the NK cells and the neutrophils, we observed more variability with higher RMSE values in some datasets which suggests that the markers and profiles for these cell types might be improved. Supplementary Figures 9 and 10 detail the performances of each tool when considering each cell type separately in the PBMC and the cancer datasets. As for EPIC-ATAC, the predictions from the other deconvolution tools are more reliable for the frequent cell types.”

      (2) In the benchmarking analysis, EPIC-ATAC is compared to several deconvolution methods, most of which were originally developed for transcriptomics data. This comparison is not completely fair unless their peculiarities and the limitations of tweaking them to work with ATAC-seq data are discussed. For instance, some methods (including the original EPIC) correct for cell-type-specific mRNA bias, which is not present in ATAC-seq data and might, thus, result in systematic errors.

      We thank the reviewer for this comment and have updated the results and methods sections as follows:

      We provide in the Materials and methods section, the paragraph “Benchmarking of the EPIC-ATAC framework against other existing deconvolution tools” which describes how each tool included in the benchmark was used in the ATAC-Seq context. We have added a reference to this section in the main text when introducing the first benchmarking analysis.

      For each tool, the main changes consisted in: (i) replacing the initial RNA-Seq profiles and markers by the EPIC-ATAC reference profiles and markers and (ii) providing as input a bulk ATAC-Seq dataset with matched ATAC-Seq features (the same approach as the one used in EPIC-ATAC was considered, see answer to the next comment). Having reference profiles/markers and an ATAC-Seq bulk query with matched features was the only requirement of the different deconvolution models to be able to run on ATAC-Seq data with the default methods parameters, except for quanTIseq. Indeed, this method, like EPIC, corrects its estimations for cell-type-specific mRNA content bias. We have disabled this option for the bulk ATAC-Seq deconvolution.

      We can however not exclude that a hyper parametrization of each tool could have helped to improve their current performances. Also, for RNA-Seq data deconvolution, some of the methods followed specific features filtering, e.g., the quanTIseq framework removes a manually curated list of noisy genes as well as aberrant immune genes identified in the TCGA data and ABIS uses immune-specific housekeeping genes. We can hypothesize that additional filtering could be explored for the ATAC-Seq deconvolution to improve the performance of the tools.

      We have clarified these points in the results section when introducing the benchmarking, in the methods and in the discussion section.

      (3) On a similar note, it could be made more explicit which adaptations were introduced in EPIC, besides the ad-hoc ATAC-seq signature, to make it applicable to this type of data.

      In the first version of the manuscript, we described the changes brought to EPIC to perform bulk ATAC-Seq deconvolution in the Material and methods section in the paragraph “Running EPIC-ATAC on bulk ATAC-Seq data”.  We have moved and completed this paragraph in the results section before the description of the evaluation of EPIC-ATAC in different datasets. The paragraph is the following:

      “EPIC-ATAC integrates the marker peaks and profiles into EPIC to perform bulk ATAC-Seq data deconvolution

      The cell-type specific marker peaks and profiles derived from the reference samples were integrated into the EPIC deconvolution tool (Racle et al., 2017; Racle and Gfeller, 2020). We will refer to this ATAC-Seq deconvolution framework as EPIC-ATAC. To ensure the compatibility of any input bulk ATAC-Seq dataset with the EPIC-ATAC marker peaks and reference profiles, we provide an option to lift over hg19 datasets to hg38 (using the liftOver R package) as the reference profiles are based on the hg38 reference genome. Subsequently, the features of the input bulk matrix are matched to our reference profiles’ features. To match both sets of features, we determine for each peak of the input bulk matrix the distance to the nearest peak in the reference profiles peaks. Overlapping regions are retained and the feature IDs are matched to their associated nearest peaks. If multiple features are matched to the same reference peak, the counts are summed. Before the estimation of the cell-type proportions, we transform the data following an approach similar to the transcripts per million (TPM) transformation which has been shown to be appropriate to estimate cell fractions from bulk mixtures in RNA-Seq data (Racle et al., 2017; Sturm et al., 2019). We normalize the ATAC-Seq counts by dividing counts by the peak lengths as well as samples depth and rescaling counts so that the counts of each sample sum to 106. In RNA-Seq based deconvolution, EPIC uses an estimation of the amount of mRNA in each reference cell type to derive cell proportions while correcting for cell-type-specific mRNA bias. For the ATAC-Seq based deconvolution these values were set to 1 to give similar weights to all cell-types quantifications. Indeed ATAC-Seq measures signal at the DNA level, hence the quantity of DNA within each reference cell type is similar.”

      (4) Given that the final applicability of EPIC-ATAC is on real bulk RNA-seq data, whose characteristics might not be completely recapitulated by pseudo-bulk samples, it would be interesting to see EPIC and EPIC-ATAC compared on a dataset with matched, real bulk RNA-seq and ATAC-seq, respectively. It would nicely complement the analysis of Figure 7 and could be used to dissect the commonalities and peculiarities of these two approaches.

      We thank the reviewer for raising this important point. EPIC-ATAC will be applied to real bulk ATAC-Seq data and pseudobulk data cannot indeed fully recapitulate the bulk signals.  Recently, a dataset composed of more than 100 samples with matched bulk RNA-Seq, bulk ATAC-Seq as well as matched flow cytometry data has been published by Morandini and colleagues in GeroScience in November 2023. We thus retrieved these data to compare the predictions obtained by EPIC-ATAC on the bulk ATAC-Seq data and the predictions of the original version of EPIC on the bulk RNA-Seq data to the cell-type quantification obtained by flow cytometry. We also assessed whether both modalities could be complementary using a simple approach averaging the predictions obtained from both modalities. The results of these analyzes have been summarized in the Figure 7C and are described in the main text in the last paragraph of the paper:

      “We compared the predictions obtained using each modality to the flow cytometry cell-type quantifications. EPIC-ATAC predictions were better correlated with the flow cytometry measures for some cell types (e.g., CD8+, CD4+ T cells, NK cells) while this trend was observed with the EPIC-RNA predictions in other cell types (B cells, neutrophils, monocytes) (Figure 7C). We then tested whether the predictions obtained from both modalities could be combined to improve the accuracy of each cell-type quantification. Averaging the predictions obtained from both modalities shows a moderate improvement (Figure 7C), suggesting that the two modalities can complement each other.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript expands the current bulk sequencing data deconvolution toolkit to include ATAC-seq. The EPIC-ATAC tool successfully predicts accurate proportions of immune cells in bulk tumour samples and EPIC-ATAC seems to perform well in benchmarking analyses. The authors achieve their aim of developing a new bulk ATAC-seq deconvolution tool.

      Strengths:

      The manuscript describes simple and understandable experiments to demonstrate the accuracy of EPIC-ATAC. They have also been incredibly thorough with their reference dataset collections. The authors have been robust in their benchmarking endeavours and measured EPIC-ATAC against multiple datasets and tools.

      Weaknesses:

      Currently, the tool has a narrow applicability in that it estimates the percentage of immune cells in a bulk ATAC-seq experiment.

      Comments:

      (1) Has any benchmarking been done on the runtime of the tool? Although EPIC-ATAC seems to "win" in benchmarking metrics, sometimes the differences are quite small. If EPIC-ATAC takes forever to run, compared to another tool that is a lot quicker, might some people prefer to sacrifice 0.01 in correlation for a quicker running tool?

      We thank the reviewer for raising this point that was not addressed in the manuscript. We have added a supplementary figure (Supplementary Figure 8) which represents the CPU time used by each tool. The figure shows that all the tools could be run in less than 20 seconds in average. This figure has been mentioned at the end of the benchmarking paragraphs.

      (2) In Figure 3B the data points look a bit squashed in the bottom-left corner. Could the plot be replotted with the data point spread out? There also seems to be some inter-patient variability. Could the authors comment on that?

      We have updated Figure 3B to increase the visibility of the dots in the bottom-left corner. To do so, we have limited the x and y axes to the maximum of the predicted proportions for the y axis and true proportions for the x axis.

      We also acknowledge that the accuracy of the predictions varies across samples. In particular, one sample (Sample4, star shape on Figure 3B) exhibits larger discrepancies between EPIC-ATAC predictions and the ground truth. To understand the lower performance, we have visualized our marker peaks in the five PBMC samples (Figure below). Based on this visualization, we can see that Sample4 might be an outlier sample considering that its cellular composition is similar to that of Sample2 and Sample5, however this sample shows particularly high ATAC-Seq accessibility at the monocytes and dendritic markers. This can explain why EPIC-ATAC overestimates the proportions of the two populations in this case. We have added the previously mentioned figures as a Supplementary Figure (Supplementary Figure 2) and have described it in the results section in the paragraph “EPIC-ATAC accurately estimates immune cell fractions in PBMC ATAC-Seq samples”.

      (3) Could the authors comment on the possibility of expanding EPIC-ATAC into more than a percentage prediction tool? Perhaps EPIC-ATAC could remove the immune cell signal from the bulk ATAC-seq data to "purify" the uncharacterised cells in silico, or generate pseudo-ATAC-seq tracks of the identified cell types.

      We thank the reviewer for this interesting question. As suggested by the reviewer, one approach to purify bulk genomics data using the cell-type proportions estimated by a cell-type deconvolution tool is to subtract the weighted sum of the signal observed in the reference data, weights corresponding to the predicted proportions. We used this approach on the EPIC-ATAC predictions obtained from pseudobulks built from scATAC-Seq data from diverse cancer types coming from the Human Tumor Atlas Network (HTAN) (See also the answer of the first recommendation of Reviewer 1). This dataset allows us to compare for a relatively large number of samples (a maximum of 25  samples in a cancer type cohort) the purified signal to the true signal derived from the single-cell data. The results are presented in the figure below which shows that the correlations between the predicted and true signals are relatively good in most of the cancer types (blue boxplots). However, these correlation levels are lower than the ones obtained when comparing the signal obtained from the entire pseudobulk (red boxplots) with the true signal. This suggests that this purification approach leads to a signal that is less precise and accurate than the signal resulting from all cells mixtures.

      Author response image 1.

      Boxplots of the correlation values obtained from the comparison of the bulk signal and the ground truth signal from the uncharacterized cells in each sample (red) and from the comparison of the predicted signal and the ground truth signal from the uncharacterized cells in each sample (blue).

      Also, note that in our simple approach, negative values can be obtained. The predicted signal will thus be difficult to interpret and to use in downstream analyses. Methods claiming to perform purification of bulk samples use more complex and dedicated algorithms. For example, Symphony (Burdziak et al., 2019) (cited in our introduction) uses single-cell RNA-Seq data in addition to the bulk chromatin accessibility data to infer cluster-specific accessibility profiles. Considering that EPIC was not designed for purification purposes, we decided not to include this analysis in the updated version of the manuscript.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The original EPIC had two different signatures for application to blood or tumor RNA-seq. It is not clear instead if EPIC-ATAC applies with the same signature and framework to any tissue and disease context. This aspect should be clarified in the text.

      We thank the reviewer for raising this point which was not clear in the previous version of the manuscript. As in the original version of EPIC, in EPIC-ATAC two reference profiles and sets of markers are available, the PBMC reference and the TME reference. We used the PBMC reference profiles and markers to deconvolve the PBMC samples and the TME reference profiles and markers to deconvolve the cancer samples. We have clarified this point in the result section of the main text in the paragraph “ATAC-Seq data from sorted cell populations reveal cell-type specific marker peaks and reference profiles” as follows (added text underlined):

      “The resulting marker peaks specific only to the immune cell types were considered for the deconvolution of PBMC samples (PBMC markers). For the deconvolution of tumor bulk samples, the lists of marker peaks specific to fibroblasts and endothelial cells were added to the PBMC markers. This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas (TCGA) (Corces et al., 2018), i.e., markers exhibiting the highest correlation patterns in the tumor bulk samples were selected using the findCorrelation function from the caret R package (Kuhn, 2008) (Figure 1, box 4, see the Material and methods, section 2). The latter filtering ensures the relevance of the markers in the TME context since cell-type specific TME markers are expected to be correlated in tumor bulk ATAC-Seq measurements (Qiu et al., 2021). 716 markers of immune, fibroblasts and endothelial cell types remained after the last filtering (defined as TME markers). Considering the difference in cell types and the different filtering steps applied on the PBMC and TME markers, we recommend to use the TME markers and profiles to deconvolve bulk samples from tumor samples and the PBMC markers and profiles to deconvolve PBMC samples.”

      We also note that when running EPIC-ATAC using the PBMC markers and the TME markers independently to perform the deconvolution of the cancer datasets, we see that overall the use of the TME markers leads to a better performance (Figure below).

      Figure legend: Correlation and RMSE values obtained when running EPIC-ATAC on each cancer dataset (points) using the PBMC (red) and the TME (blue) markers.

      To demonstrate that the TME markers can be applied to different cancer types, we have completed the evaluation of EPIC-ATAC on tumor samples by considering an additional dataset: the Human Tumor Atlas Network (HTAN) single-cell multiomic (scRNA-Seq and scATAC-Seq) dataset. We have processed this dataset and built scATAC-Seq pseudobulks for 7 cancer types on which EPIC-ATAC was applied to. This analysis has been summarized in Figure 4 and Supplementary Figure 4 and shows that EPIC-ATAC is applicable in a diverse set of tissues.

      (2) EPIC and EPIC-ATAC have a valuable feature, which is absent from most deconvolution methods: the estimation of unknown content. It would be informative for the users to understand from the benchmarking analysis whether this feature gives an advantage to EPIC-ATAC with respect to the other approaches.

      Indeed, among the tools that we included in our benchmarking analysis, only EPIC-ATAC and quanTIseq enable users to predict the proportions of cells that are not present in the reference profiles, i.e., the uncharacterized cells. For the other tools we thus fixed the estimated proportions of uncharacterized cells to 0. This approach provides a clear and significant advantage to EPIC-ATAC and to quanTIseq. For this reason, we also provide a version of the benchmarking in which we exclude the uncharacterized cells and rescale the true and estimated cell-type proportions to sum to 1. In this second benchmarking approach, EPIC-ATAC still outperforms some of the other deconvolution tools.

      We have clarified this point in the results section, in the paragraph “EPIC-ATAC accurately predicts fractions of cancer and non-malignant cells in tumor samples”.

      (3) The selection of the most discriminative markers is very well described in the text and beautifully illustrated in Figure 2. However, it is unclear why UMAP plots are used to represent cell-type similarities and dissimilarities. Would a linear dimensionality reduction approach like PCA be already sufficient to show these groups, especially considering the not-so-extreme dimensionality of the underlying data? In addition, a statistic that could be also considered to compare clusters to the cell type labels in the two scenarios is the Adjusted Rand Index (ARI).

      We thank the reviewer for this relevant comment. We initially used UMAP to facilitate the visualization of the different cell-type groups. However, it is true that the three first axes of the principal component analyses performed based on each set of marker peaks already capture most of the structure in the data and that the use of UMAP can lead to an artificial enhancement of separation between the different groups of cells. We have updated Figure 2B by replacing the UMAP scatter plots by 3D representations of the first three principal components of the PCA and have added in Supplementary Figure 1B the pairwise scatter plots of these first 3 principal components. On the main figures, we have also added the ARI metric comparing the cell-type annotation and the clustering obtained using the first 10 axes of the PCA and model based clustering.

      (4) In the introduction, it is stated that "the reasonable cost and technical advantages of these protocols foreshadow an increased usage of ATAC-Seq in cancer studies". I would suggest adding a reference to justify this trend. Also, it should be discussed how ATAC-seq deconvolution compares to other types of deconvolution approaches applied to cheaper epigenetic data like methylation one (e.g. epidish, methylcc, tca, minfi).

      We have complemented this sentence with two references to justify the assertion: (i) a review published by Luo, Gribskov and Wang in 2022 showing the increasing number of ATAC-Seq studies in the field of cancer research, and (ii) a protocol paper from Grandi et al. published in 2022 on the state-of-the-art Omni-ATAC protocol for ATAC-sequencing which discusses the broad applicability and the technical advantages of ATAC-sequencing. Also in the preceding sentence, a recent ATAC-Seq protocol that can be applied to FFPE samples has been mentioned, FFPE samples being the most common samples in clinical cancer research.

      We agree with the reviewer on the fact that other epigenetic assays such as methylation assays are cost effective. However, ATAC-sequencing provides additional information on the epigenetic landscape of a sample’s genome and some questions regarding regulatory regions and transcription factor activity cannot be answered with methylation data. Methods that can be applied on ATAC-Seq data specifically are thus needed. Most of the cell-type deconvolution algorithms existing so far are applicable on RNA-Seq or methylation data. These algorithms often use similar methodological concepts, e.g., linear combination of the reference profiles for reference-based methods, which could be used in different modalities. However, methylation-based deconvolution tools often take as input a data format that is specific to methylation data, e.g., two color micro array data (RGChannelSet R object) for the minfi deconvolution function (estimatesCellCounts) or leverage methylation-specific information to perform the deconvolution. For example, methylCC uses a model based on latent variables representing a binarized measures of the methylation status of cell-type specific regions (1 or 0 for clearly methylated or unmethylated regions). Such methods are more difficult to adapt than tools  based on RNA-Seq data where the signal is quantified using read counts similarly to ATAC-Seq data.

      Nevertheless, some methods such as EPIdish or MethylCIBERSORT have proposed new methylation reference profiles and have used existing models that are not specific to methylation data to deconvolve the bulk data. In our work, we followed a similar approach where we propose new reference profiles specific to chromatin accessibility data, integrate them to an existing method EPIC as well as test them in other existing tools. Note that methylation reference profiles cannot be directly used for ATAC-Seq data deconvolution considering that methylation measures methylation status at CpG sites (dinucleotides) and ATAC-Seq measures the accessibility of regions of hundreds base pairs.

      An analysis comparing the performance of methylation-based deconvolution and ATAC-Seq based deconvolution would be informative. However, such analysis is beyond the scope of our paper considering that none of the datasets used for our benchmarking provide these two modalities for the same samples.

      In the manuscript, we have completed the references associated to the methylation-based deconvolution tools with the ones mentioned in the previous paragraphs and by the reviewer and have completed the discussion as follows:

      “The comparison of EPIC-ATAC applied on ATAC-Seq data with EPIC applied on RNA-Seq data has shown that both modalities led to similar performances and that they could complement each other. Another modality that has been frequently used in the context of bulk sample deconvolution is methylation. Methylation profiling techniques such as methylation arrays are cost effective (Kaur et al., 2023) and DNA methylation signal is highly cell-type specific (Kaur et al., 2023; Loyfer et al., 2023). Considering that methylation and chromatin accessibility measure different features of the epigenome, additional analyses comparing and/or complementing ATAC-seq based deconvolution with methylation-based deconvolution could be of interest as future datasets profiling both modalities in the same samples become available.”

      (5) In the Results section, some methodological steps could be phrased in a bit more extensive way to let the reader understand the rationale and the actual approach. I recognize there is also a reference to the Methods section, where all methodologies are reported in detail, but some of the sentences are hard to understand due to their synthetic format, e.g.: "markers with potential residual accessibility in human tissues were then filtered out".

      We thank the reviewer for this comment and we have followed his recommendation to expand sentences with a synthetic format. Text changes and additions are underlined below:

      “To limit batch effects, the collected samples were homogeneously processed from read alignment to peak calling. For each cell type, we derived a set of stable peaks observed across samples and studies, i.e. for each study, peaks detected in at least half of the samples were considered, and for each cell type, only peaks detected jointly in all studies were kept (see Materials and Methods, section 1).”

      “To filter out markers that could be accessible in other human cell-types than those included in our reference profiles, we used the human atlas study (K. Zhang et al., 2021), which identified modules of open chromatin regions accessible in a comprehensive set of human tissues, and we excluded from our marker list the markers overlapping these modules (Figure 1, box 3, see Materials and Methods section 2).”

      “For the deconvolution of tumor bulk samples, the lists of marker peaks specific to fibroblasts and endothelial cells were added to the PBMC markers. This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas (TCGA) (Corces et al., 2018), i.e., markers exhibiting the highest correlation patterns in the tumor bulk samples were selected using the findCorrelation function from the caret R package (Kuhn, 2008)  (Figure 1, box 4, see the Material and methods, section 2).”

      Also, following the comments and recommendations of the Reviewer 1, we have: (i) moved the method section describing the adaptation of EPIC to ATACseq data to provide more details in the results section (see answer to the third comment of Reviewer 1), (ii) clarified how the existing tools used in the benchmarking analyses were adapted for ATAC-Seq deconvolution (see answer to the second comment of Reviewer 1), and (iii) detailed how the comparison between our estimations of the infiltration levels in the samples from Kumegawa et al. and the estimations from the original study was performed (see answer to the seventh recommendation of Reviewer 1).

      (6) In the main text, it is stated that "the list of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from diverse cancer types from The Cancer Genome Atlas". It should be clarified if these are only solid cancers, or if blood cancers were also used.

      We have considered only the solid cancers and have clarified this point in the results section: “This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas”.

      (7) When reporting that "these predictions are consistent with the infiltration level estimations reported in the original publication", it should be mentioned how the infiltration levels were quantified in this publication and how this agreement was quantified. This would be important also to claim in the abstract that "EPIC-ATAC accurately infers the immune contexture of the main breast cancer subtypes".

      We thank the reviewer for this comment, we acknowledge that the agreement between the EPIC-ATAC predictions and the infiltration levels quantified in the original publication should be further described in the paper. We have expanded the text in the results section in the paragraph “EPIC-ATAC accurately infers the immune contexture in a bulk ATAC-Seq breast cancer cohort” to clarify this point. Additionally, we have added a panel in Figure 6 (panel A) which shows a good agreement between EPIC-ATAC predictions and the metric used in the original paper to evaluate the infiltration levels of different cell types.

      The added text is underlined below:

      “We applied EPIC-ATAC to a breast cancer cohort of 42 breast ATAC-Seq samples including samples from two breast cancer subtypes, i.e., 35 oestrogen receptor (ER)-positive human epidermal growth factor receptor 2 (HER2)-negative (ER+/HER2-) samples and 7 triple negative (TNBC) tumors (Kumegawa et al., 2023). No cell sorting was performed in parallel to the chromatin accessibility sequencing. For this reason, the authors used a set of cell-type-specific cis-regulatory elements (CREs) identified in scATAC-Seq data from similar breast cancer samples (Kumegawa et al., 2022) and estimated the amount of infiltration of each cell type by averaging the ATAC-Seq signal of each set of cell-type-specific CREs in their samples. We used EPIC-ATAC to estimate the proportions of different cell types of the TME. These predictions were then compared to the metric used by Kumegawa and colleagues in their study to infer levels of infiltration. A high correlation between the two metrics was observed for each cell type (Pearson’s correlation coefficient from 0.5 for myeloid cells to 0.94 for T cells, Figure 6A).”  

      (8) It should be made explicit if EPIC-ATAC quantifies mDC, pDC, or their sum.

      In our collection of reference ATAC-Seq samples from which the markers and profiles have been derived, mDCs and pDCs were both included in the dendritic cells.  EPIC-ATAC thus quantifies the total amount of dendritic cells, i.e., mDCs and pDCs included. We have added a sentence in the main text to clarify this point:

      To identify robust chromatin accessibility marker peaks of cancer relevant cell types, we collected 564 samples of sorted cell populations from twelve studies including eight immune cell types (B cells […] dendritic cells (DCs) (mDCs and pDCs are grouped in this cell-type category) […] and  endothelial (Liu et al., 2020; Xin et al., 2020) cells (Figure 1 box 1, Figure 2A, Supplementary Table 1).

      Reviewer #2 (Recommendations For The Authors):

      The authors should double-check the naming of tools is done correctly e.g. ChIPSeeker has been spelled incorrectly in some instances throughout the manuscript.

      We thank the reviewer for pointing out this mistake and have corrected the mistake in the main text.

    1. Author response:

      We thank the editor and reviewers for the time they spent reviewing our manuscript entitled ‘Overnight fasting facilitates safety learning by changing the neurophysiological response to relief from threat omission’ which was sent as an original paper for a potential publication in eLife.

      Since we take the reviewer comments at heart and recognize the very complex scenario of our previous and current results we will take more time to re-think the paper. This time will serve us to look back to the interpretation of the results of our previous behavioral study, to the preregistration plan as well as findings of our current fMRI (replication) study.

      We aim to address the fundamental issues indicated by the reviewers as soon and as clearly as possible.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript considers a mechanistic extension of MacArthur's consumer-resource model to include chasing down food and potential encounters between the chasers (consumers) that lead to less efficient feeding in the form of negative feedback. After developing the model, a deterministic solution and two forms of stochastic solutions are presented, in agreement with each other. Finally, the model is applied to explain observed coexistence and rank-abundance data.

      We thank the reviewer for the accurate summary of our manuscript.

      Strengths:

      The application of the theory to natural rank-abundance curves is impressive. The comparison with the experiments that reject the competitive exclusion principle is promising. It would be fascinating to see if in, e.g. insects, the specific interference dynamics could be observed and quantified and whether they would agree with the model.

      The results are clearly presented; the methods adequately described; the supplement is rich with details.

      There is much scope to build upon this expansion of the theory of consumer-resource models. This work can open up new avenues of research.

      We appreciate the reviewer for the very positive comments. We have followed many of the suggestions raised by the reviewer, and the manuscript is much improved as a result.

      Following the reviewer’s suggestions, we have now used Shannon entropies to quantify the model comparison with experiments that reject the Competitive Exclusion Principle (CEP). Specifically, for each time point of each experimental or model-simulated community, we calculated the Shannon entropies using the formula:

      , where is the probability that a consumer individual belongs to species C<sub>i</sub> at the time stamp of t. The comparison of Shannon entropies in the time series between those of the experimental data and SSA results shown in Fig. 2D-E is presented in Appendix-fig. 7C-D. The time averages and standard deviations (δH) of the Shannon entropies for these experimental or SSA model-simulated communities are as follows:

      , ; ,

      , , .

      Meanwhile, we have calculated the time averages and standard deviations (δC<sub>i</sub>) of the species’ relative/absolute abundances for the experimental or SSA model-simulated communities shown in Fig. 2D-E, which are as follows:

      , ; , ; , , , , where the superscript “(R)” represents relative abundances.

      From the results of Shannon entropies shown in Author response image 1 (which are identical to those of Appendix-fig. 7C-D) and the quantitative comparison of the time average and standard deviation between the model and experiments presented above, it is evident that the model results in Fig. 2D-E exhibit good consistency with the experimental data. They share roughly identical time averages and standard deviations in both Shannon entropies and the species' relative/absolute abundances for most of the comparisons. All these analyses are included in the appendices and mentioned in the main text.

      Author response image 1.

      Shannon Entropies of the experimental data and SSA results in Fig. 2D-E, redrawn from Appendix-fig. 7C-D.

      Weaknesses:

      I am questioning the use of carrying capacity (Eq. 4) instead of using nutrient limitation directly through Monod consumption (e.g. Posfai et al. who the authors cite). I am curious to see how these results hold or are changed when Monod consumption is used.

      We thank the reviewer for raising this question. To explain it more clearly, the equation combining the third equation in Eq. 1 and Eq. 4 of our manuscript is presented below as Eq. R1:

      where x<sub>il</sub> represents the population abundance of the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, κ<sub>l</sub> stands for the steady-state population abundance of species R<sub>l</sub> (the carrying capacity) in the absence of consumer species. In the case with no consumer species, then x<sub>il</sub> \= 0 since C<sub>i</sub> \= 0 (i\=1,…,S<sub>C</sub>), thus R<sub>l</sub> = κ<sub>l</sub> when R<sub>l</sub> = 0.

      Eq. R1 for the case of abiotic resources is comparable to Eq. (1) in Posfai et al., which we present below as Eq. R2:

      where c<sub>i</sub> represents the concentration of nutrient i, and thus corresponds to our R<sub>l</sub> ; n<sub>σ</sub>(t) is the population of species σ, which corresponds to our C<sub>i</sub> ; s<sub>i</sub> stands for the nutrient supply rate, which corresponds to our ζl ; µi denotes the nutrient loss rate, corresponding to our is the coefficient of the rate of species σ for consuming nutrient i, which corresponds to our in Posfai et al. is the consumption rate of nutrient i by the population of species σ, which corresponds to our x<sub>il</sub>.

      In Posfai et al., is the Monod function: and thus

      In our model, however, since predator interference is not involved in Posfai et al., we need to analyze the form of x<sub>il</sub> presented in the functional form of x<sub>il</sub> ({R<sub>l</sub>},{C<sub>i</sub>}) in the case involving only chasing pairs. Specifically, for the case of abiotic resources, the population dynamics can be described by Eq. 1 combined with Eq. R1:

      where and . For convenience, we consider the case of S<sub>R</sub> \=1 where the Monod form was derived (Monod, J. (1949). Annu. Rev. Microbiol., 3, 371-394.). From , we have

      where , and l =1. If the population abundance of the resource species is much larger than that of all consumer species (i.e., ), then,

      and R<sub>l</sub><sup>(F)</sup> ≈ R<sub>l</sub>. Combined with R5, and noting that C<sub>i</sub> \= C<sub>i</sub>(F) + xil we can solve for x<sub>il</sub> :

      with l =1 since S<sub>R</sub> \=1. Comparing Eq. R6 with Eq. R3, and considering the symbol correspondence explained in the text above, it is now clear that our model can be reduced to the Monod consumption form in the case of S<sub>R</sub> \=1 where the Monod form was derived from.

      Following on the previous comment, I am confused by the fact that the nutrient consumption term in Eq. 1 and how growth is modeled (Eq. 4) are not obviously compatible and would be hard to match directly to experimentally accessible quantities such as yield (nutrient to biomass conversion ratio). Ultimately, there is a conservation of mass ("flux balance"), and therefore the dynamics must obey it. I don't quite see how conservation of mass is imposed in this work.

      We thank the reviewer for raising this question. Indeed, the population dynamics of our model must adhere to flux balance, with the most pertinent equation restated here as Eq. R7:

      Below is the explanation of how Eq. R7, and thus Eqs. 1 and 4 of our manuscript, adhere to the constraint of flux balance. The interactions and fluxes between consumer and resource species occur solely through chasing pairs. At the population level, the scenario of chasing pairs among consumer species C<sub>i</sub> and resource species R<sub>l</sub> is presented in the follow expression:

      where the superscripts "(F)" and "(P)" represent the freely wandering individuals and those involved in chasing pairs, respectively, "(+)" stands for the gaining biomass of consumer C<sub>i</sub> from resource R<sub>l</sub>. In our manuscript, we use x<sub>l</sub> to represent the population abundance (or equivalently, the concentration, for a well-mixed system with a given size) of the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, and thus, the net flow from resource species R<sub>l</sub> to consumer species C<sub>i</sub> per unit time is k<sub>il</sub>x<sub>il</sub>. Noting that there is only one R<sub>l</sub> individual within the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, then the net effect on the population dynamics of species is −k<sub>il</sub>x<sub>il</sub>. However, since a consumer individual from species C<sub>i</sub> could be much heavier than a species R<sub>l</sub> individual, and energy dissipation would be involved from nutrient conversion into biomass, we introduce a mass conversion ratio w<sub>l</sub> in our manuscript. For example, if a species C<sub>i</sub> individual is ten times the weight of a species R<sub>l</sub> individual, without energy dissipation, the mass conversion ratio wil should be 1/10 (i.e., wil \= 0.1 ), however, if half of the chemical energy is dissipated into heat from nutrient conversion into biomass, then w<sub>l</sub> \= 0.1 0.5× = 0.05. Consequently, the net effect of the flux from resource species _R_l to consumer species C<sub>i</sub> per unit time on the population dynamics is , and flux balance is clearly satisfied.

      For the population dynamics of a consumer species C<sub>i</sub>, we need to consider all the biomass influx from different resource species, and thus there is a summation over all species of resources, which leads to the term of in Eq. R7. Similarly, for the population dynamics of a resource species R<sub>l</sub>, we need to lump sum all the biomass outflow into different consumer species, resulting in the term of in Eq. R7.

      Consequently, Eq. R7 and our model satisfy the constraint of flux balance.

      These models could be better constrained by more data, in principle, thereby potential exists for a more compelling case of the relevance of this interference mechanism to natural systems.

      We thank the reviewer for raising this question. Indeed, our model could benefit from the inclusion of more experimental data. In our manuscript, we primarily set the parameters by estimating their reasonable range. Following the reviewer's suggestions, we have now specified the data we used to set the parameters. For example, in Fig. 2D, we set 𝐷<sub>2</sub>\=0.01 with τ=0.4 days, resulting in an expected lifespan of Drosophila serrata in our model setting of 𝜏⁄𝐷<sub>2</sub>\= 40 days, which roughly agrees with experimental data showing that the average lifespan of D. serrata is 34 days for males and 54 days for females (lines 321-325 in the appendices; reference: Narayan et al. J Evol Biol. 35: 657–663 (2022)). To explain biodiversity and quantitatively illustrate the rank-abundance curves across diverse communities, the competitive differences across consumer species, exemplified by the coefficient of variation of the mortality rates - a key parameter influencing the rank-abundance curve, were estimated from experimental data in the reference article (Patricia Menon et al., Water Research (2003) 37, 4151) using the two-sigma rule (lines 344-347 in the appendices).

      Still, we admit that many factors other than intraspecific interference, such as temporal variation, spatial heterogeneity, etc., are involved in breaking the limits of CEP in natural systems, and it is still challenging to differentiate each contribution in wild systems. However, for the two classical experiments that break CEP (Francisco Ayala, 1969; Thomas Park, 1954), intraspecific interference could probably be the most relevant mechanism, since factors such as temporal variation, spatial heterogeneity, cross-feeding, and metabolic tradeoffs are not involved in those two experimental systems.

      The underlying frameworks, B-D and MacArthur are not properly exposed in the introduction, and as a result, it is not obvious what is the specific contribution in this work as opposed to existing literature. One needs to dig into the literature a bit for that.

      The specific contribution exists, but it might be more clearly separated and better explained. In the process, the introduction could be expanded a bit to make the paper more accessible, by reviewing key features from the literature that are used in this manuscript.

      We thank the reviewer for these very insightful suggestions. Following these suggestions, we have now added a new paragraph and revised the introduction part of our manuscript (lines 51-67 in the main text) to address the relevant issues. Our paper is much improved as a result.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kang et al investigates how the consideration of pairwise encounters (consumer-resource chasing, intraspecific consumer pair, and interspecific consumer pair) influences the community assembly results. To explore this, they presented a new model that considers pairwise encounters and intraspecific interference among consumer individuals, which is an extension of the classical Beddington-DeAngelis (BD) phenomenological model, incorporating detailed considerations of pairwise encounters and intraspecific interference among consumer individuals. Later, they connected with several experimental datasets.

      Strengths:

      They found that the negative feedback loop created by the intraspecific interference allows a diverse range of consumer species to coexist with only one or a few types of resources. Additionally, they showed that some patterns of their model agree with experimental data, including time-series trajectories of two small in-lab community experiments and the rank-abundance curves from several natural communities. The presented results here are interesting and present another way to explain how the community overcomes the competitive exclusion principle.

      We appreciate the reviewer for the positive comments and the accurate summary of our manuscript.

      Weaknesses:

      The authors only explore the case with interspecific interference or intraspecific interference exists. I believe they need to systematically investigate the case when both interspecific and intraspecific interference exists. In addition, the text description, figures, and mathematical notations have to be improved to enhance the article's readability. I believe this manuscript can be improved by addressing my comments, which I describe in more detail below.

      We thank the reviewer for these valuable suggestions. We have followed many of the suggestions raised by the reviewer, and the manuscript is much improved as a result.

      (1) In nature, it is really hard for me to believe that only interspecific interference or intraspecific interference exists. I think a hybrid between interspecific interference and intraspecific interference is very likely. What would happen if both the interspecific and intraspecific interference existed at the same time but with different encounter rates? Maybe the authors can systematically explore the hybrid between the two mechanisms by changing their encounter rates. I would appreciate it if the authors could explore this route.

      We thank the reviewer for raising this question. Indeed, interspecific interference and intraspecific interference simultaneously exist in real cases. To differentiate the separate contributions of inter- and intra-specific interference on biodiversity, we considered different scenarios involving inter- or intra-specific interference. In fact, we have also considered the scenario involving both inter- and intra-specific interference in our old version for the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1, where two consumer species compete for one resource species (Appendix-fig. 5, and lines 147-148, 162-163 in the main text of the old version, or lines 160-161, 175-177 in the new version).

      Following the reviewer’s suggestions, we have now systematically investigated the cases of S<sub>C</sub> = 6, S<sub>R</sub> = 1, and S<sub>C</sub> = 20, S<sub>R</sub> = 1, where six or twenty consumer species compete for one resource species in scenarios involving chasing pairs and both inter- and intra-specific interference using both ordinary differential equations (ODEs) and stochastic simulation algorithm (SSA). These newly added ODE and SSA results are shown in Appendix-fig. 5 F-H, and we have added a new paragraph to describe these results in our manuscript (lines 212-215 in the main text). Consistent with our findings in the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1, the species coexistence behavior in the cases of both S<sub>C</sub> = 6, S<sub>R</sub> = 1, and S<sub>C</sub> = 20, S<sub>R</sub> = 1 is very similar to those without interspecific interference: all consumer species coexist with one type of resources at constant population densities in the ODE studies, and the SSA results fluctuate around the population dynamics of the ODEs.

      As for the encounter rates of interspecific and intraspecific interference, in fact, in a well-mixed system, these encounter rates can be derived from the mobility rates of the consumer species using the mean field method. For a system with a size of L2, the interspecific encounter rate between consumer species C<sub>i</sub> and C<sub>j</sub> (ij) is please refer to lines 100-102, 293-317 in the main text, and see also Appendix-fig. 1), where r<sup>(I)</sup> is the upper distance for interference, while v<sub>C<sub>i</sub></sub> and v<sub>C<sub>j</sub></sub> represent the mobility rates of species C<sub>i</sub> and C<sub>j</sub>, respectively. Meanwhile, the intraspecific encounter rates within species C<sub>i</sub> and species C<sub>j</sub> are and , respectively.

      Thus, once the intraspecific encounter rates a’<sub>ii</sub> are a’<sub>jj</sub> given, the interspecific encounter rate between species C<sub>i</sub> and C<sub>j</sub> is determined. Consequently, we could not tune the encounter rates of interspecific and intraspecific interference at will in our study, especially noting that for clarity reasons, we have used the mortality rate as the only parameter that varies among the consumer species throughout this study. Alternatively, we have made a systematic study on analyzing the influence of varying the separate rate and escape rate on species coexistence in the case of two consumers competing for a single type of resources (see Appendix-fig. 5A).

      (2) In the first two paragraphs of the introduction, the authors describe the competitive exclusion principle (CEP) and past attempts to overcome the CEP. Moving on from the first two paragraphs to the third paragraph, I think there is a gap that needs to be filled to make the transition smoother and help readers understand the motivations. More specifically, I think the authors need to add one more paragraph dedicated to explaining why predator interference is important, how considering the mechanism of predator interference may help overcome the CEP, and whether predator interference has been investigated or under-investigated in the past. Then building upon the more detailed introduction and movement of predator interference, the authors may briefly introduce the classical B-D phenomenological model and what are the conventional results derived from the classical B-D model as well as how they intend to extend the B-D model to consider the pairwise encounters.

      We thank the reviewer for these very insightful suggestions. Following these suggestions, we have added a new paragraph and revised the introduction part of our paper (lines 51-67 in the main text). Our manuscript is significantly improved as a result.

      (3) The notations for the species abundances are not very informative. I believe some improvements can be made to make them more meaningful. For example, I think using Greek letters for consumers and English letters for resources might improve readability. Some sub-scripts are not necessary. For instance, R^(l)_0 can be simplified to g_l to denote the intrinsic growth rate of resource l. Similarly, K^(l)_0 can be simplified to K_l. Another example is R^(l)_a, which can be simplified to s_l to denote the supply rate. In addition, right now, it is hard to find all definitions across the text. I would suggest adding a separate illustrative box with all mathematical equations and explanations of symbols.

      We thank the reviewer for these very useful suggestions. We have now followed many of the suggestions to improve the readability of our manuscript. Given that we have used many English letters for consumers and there are already many symbols of English and Greek letters for different variables and parameters in the appendices, we have opted to use Greek letters for parameters specific to resource species and English letters for those specific to consumer species. Additionally, we have now added Appendix-tables 1-2 in the appendices (pages 16-17 in the appendices) to illustrate the symbols used throughout our manuscript.

      (4) What is the f_i(R^(F)) on line 131? Does it refer to the growth rate of C_i? I noticed that f_i(R^(F)) is defined in the supplementary information. But please ensure that readers can understand it even without reading the supplementary information. Otherwise, please directly refer to the supplementary information when f_i(R^(F)) occurs for the first time. Similarly, I don't think the readers can understand \Omega^\prime_i and G^\prime_i on lines 135-136.

      We thank the reviewer for raising these questions. We apologize for not illustrating those symbols and functions clearly enough in our previous version of the manuscript. f<sub>i</sub>R<sup>(F)</sup>⟯ is a function of the variable R<sup>(F)</sup> with the index i, which is defined as and for i=2. Following the reviewer’s suggestions, we have now added clear definitions for symbols and functions and resolved these issues. The definitions of \Omega_i, \Omega^\prime_i, G, and G^\prime are overly complex, and hence we directly refer to the Appendices when they occur for the first time in the main text.

      Reviewer #3 (Public Review):

      Summary:

      A central question in ecology is: Why are there so many species? This question gained heightened interest after the development of influential models in theoretical ecology in the 1960s, demonstrating that under certain conditions, two consumer species cannot coexist on the same resource. Since then, several mechanisms have been shown to be capable of breaking the competitive exclusion principle (although, we still lack a general understanding of the relative importance of the various mechanisms in promoting biodiversity).

      One mechanism that allows for breaking the competitive exclusion principle is predator interference. The Beddington-DeAngelis is a simple model that accounts for predator interference in the functional response of a predator. The B-D model is based on the idea that when two predators encounter one another, they waste some time engaging with one another which could otherwise be used to search for resources. While the model has been influential in theoretical ecology, it has also been criticized at times for several unusual assumptions, most critically, that predators interfere with each other regardless of whether they are already engaged in another interaction. However, there has been considerable work since then which has sought either to find sets of assumptions that lead to the B-D equation or to derive alternative equations from a more realistic set of assumptions (Ruxton et al. 1992; Cosner et al. 1999; Broom et al. 2010; Geritz and Gyllenberg 2012). This paper represents another attempt to more rigorously derive a model of predator interference by borrowing concepts from chemical reaction kinetics (the approach is similar to previous work: Ruxton et al. 1992). The main point of difference is that the model in the current manuscript allows for 'chasing pairs', where a predator and prey engage with one another to the exclusion of other interactions, a situation Ruxton et al. (1992) do not consider. While the resulting functional response is quite complex, the authors show that under certain conditions, one can get an analytical expression for the functional response of a predator as a function of predator and resource densities. They then go on to show that including intraspecific interference allows for the coexistence of multiple species on one or a few resources, and demonstrate that this result is robust to demographic stochasticity.

      We thank the reviewer for carefully reading our manuscript and for the positive comments on the rigorously derived model of predator interference presented in our paper. We also appreciate the reviewer for providing a thorough introduction to the research background of our study, especially the studies related to the BeddingtonDeAngelis model. We apologize for our oversight in not fully appreciating the related study by Ruxton et al. (1992) at the time of our first submission. Indeed, as suggested by the reviewer, Ruxton et al. (1992) is relevant to our study in that we both borrowed concepts from chemical reaction kinetics. Now, we have reworked the introduction and discussion sections of our manuscript, cited, and acknowledged the contributions of related works, including Ruxton et al. (1992).

      Strengths:

      I appreciate the effort to rigorously derive interaction rates from models of individual behaviors. As currently applied, functional responses (FRs) are estimated by fitting equations to feeding rate data across a range of prey or predator densities. In practice, such experiments are only possible for a limited set of species. This is problematic because whether a particular FR allows stability or coexistence depends on not just its functional form, but also its parameter values. The promise of the approach taken here is that one might be able to derive the functional response parameters of a particular predator species from species traits or more readily measurable behavioral data.

      We appreciate the reviewer's positive comments regarding the rigorous derivation of our model. Indeed, all parameters of our model can be derived from measurable behavioral data for a specific set of predator species.

      Weaknesses:

      The main weakness of this paper is that it devotes the vast majority of its length to demonstrating results that are already widely known in ecology. We have known for some time that predator interference can relax the CEP (e.g., Cantrell, R. S., Cosner, C., & Ruan, S. 2004).

      While the model presented in this paper differs from the functional form of the B-D in some cases, it would be difficult to formulate a model that includes intraspecific interference (that increases with predator density) that does not allow for coexistence under some parameter range. Thus, I find it strange that most of the main text of the paper deals with demonstrating that predator interference allows for coexistence, given that this result is already well known. A more useful contribution would focus on the extent to which the dynamics of this model differ from those of the B-D model.

      We appreciate the reviewer for raising this question and apologize for not sufficiently clarifying the contribution of our manuscript in the context of existing knowledge upon our initial submission. We have now significantly revised the introduction part of our manuscript (lines 51-67 in the main text) to make this clearer. Indeed, with the application of the Beddington-DeAngelis (B-D) model, several studies (e.g., Cantrell, R. S., Cosner, C., & Ruan, S. 2004) have already shown that intraspecific interference promotes species coexistence, and it is certain that the mechanism of intraspecific interference could lead to species coexistence if modeled correctly. However, while we acknowledge that the B-D model is a brilliant phenomenological model of intraspecific interference, for the specific research topic of our manuscript on breaking the CEP and explaining the paradox of the plankton, it is highly questionable regarding the validity of applying the B-D model to obtain compelling results.

      Specifically, the functional response in the B-D model of intraspecific interference can be formally derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)). Since we have demonstrated that the scenario involving only chasing pairs is under the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), and given the identical functional response mentioned above, it is thus highly questionable regarding the validity of the studies relying on the B-D model to break CEP or explain the paradox of the plankton.

      Consequently, one of the major objectives of our manuscript is to resolve whether the mechanism of intraspecific interference can truly break CEP and explain the paradox of the plankton in a rigorous manner. By modeling intraspecific predator interference from a mechanistic perspective and applying rigorous mathematical analysis and numerical simulations, our work resolves these issues and demonstrates that intraspecific interference enables a wide range of consumer species to coexist with only one or a handful of resource species. This naturally breaks CEP, explains the paradox of plankton, and quantitatively illustrates a broad spectrum of experimental results.

      For intuitive understanding, we introduced a functional response in our model (presented as Eq. 5 in the main text), which indeed involves approximations. However, to rigorously break the CEP or explain the paradox of plankton, all simulation results in our study were directly derived from equations 1 to 4 (main text), without relying on the approximate functional response presented in Eq. 5.

      The formulation of chasing-pair engagements assumes that prey being chased by a predator are unavailable to other predators. For one, this seems inconsistent with the ecology of most predator-prey systems. In the system in which I work (coral reef fishes), prey under attack by one predator are much more likely to be attacked by other predators (whether it be a predator of the same species or otherwise). I find it challenging to think of a mechanism that would give rise to chased prey being unavailable to other predators. The authors also critique the B-D model: "However, the functional response of the B-D model involving intraspecific interference can be formally derived from the scenario involving only chasing pairs without predator interference (Wang and Liu, 2020; Huisman and De Boer, 1997) (see Eqs. S8 and S24). Therefore, the validity of applying the B-D model to break the CEP is questionable.".

      We appreciate the reviewer for raising this question. We fully agree with the reviewer that in many predator-prey systems (e.g., coral reef fishes as mentioned by the reviewer, wolves, and even microbial species such as Myxococcus xanthus; related references: Berleman et al., FEMS Microbiol. Rev. 33, 942-957 (2009)), prey under attack by one predator can be targeted by another predator (which we term as a chasing triplet) or even by additional predator individuals (which we define as higher-order terms). However, since we have already demonstrated in a previous study (Xin Wang, Yang-Yu Liu, iScience 23, 101009 (2020)) from a mechanistic perspective that a scenario involving chasing triplets or higher-order terms can naturally break the CEP, while our manuscript focuses on whether pairwise encounters between individuals can break the CEP and explain the paradox of plankton, we deliberately excluded confounding factors that are already known to promote biodiversity, just as we excluded prevalent factors such as cross-feeding and temporal variations in our model.

      However, the way "chasing pairs" are formulated does result in predator interference because a predator attacking prey interferes with the ability of other predators to encounter the prey. I don't follow the author's logic that B-D isn't a valid explanation for coexistence because a model incorporating chasing pairs engagements results in the same functional form as B-D.

      We thank the reviewer for raising this question, and we apologize for not making this point clear enough at the time of our initial submission. We have now revised the related part of our manuscript (lines 56-62 in the main text) to make this clearer.

      In our definition, predator interference means the pairwise encounter between consumer individuals, while a chasing pair is formed by a pairwise encounter between a consumer individual and a resource individual. Thus, in these definitions, a scenario involving only chasing pairs does not involve pairwise encounters between consumer individuals (which is our definition of predator interference).

      We acknowledge that there can be different definitions of predator interference, and the reviewer's interpretation is based on a definition of predator interference that incorporates indirect interference without pairwise encounters between consumer individuals. We do not wish to argue about the appropriateness of definitions. However, since we have proven that scenarios involving only chasing pairs are under the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), while the functional response of the B-D model can be derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), it is thus highly questionable regarding the validity of applying the B-D model to break CEP.

      More broadly, the specific functional form used to model predator interference is of secondary importance to the general insight that intraspecific interference (however it is modeled) can allow for coexistence. Mechanisms of predator interference are complex and vary substantially across species. Thus it is unlikely that any one specific functional form is generally applicable.

      We thank the reviewer for raising this issue. We agree that the general insight that intraspecific predator interference can facilitate species coexistence is of great importance. We also acknowledge that any functional form of a functional response is unlikely to be universally applicable, as explicit functional responses inevitably involve approximations. However, we must reemphasize the importance of verifying whether intraspecific predator interference can truly break CEP and explain the paradox of plankton, which is one of the primary objectives of our study. As mentioned above, since the B-D model can be derived from the scenario involving only chasing pairs (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), while we have demonstrated that scenarios involving only chasing pairs are subject to the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), it is highly questionable regarding the validity of applying the B-D model to break CEP.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not see any code or data sharing. They should exist in a prominent place. The authors should make their simulations and the analysis scripts freely available to download, e.g. by GitHub. This is always true but especially so in a journal like eLife.

      We appreciate the reviewer for these recommendations. We apologize for our oversight regarding the unsuccessful upload of the data in our initial submission, as the data size was considerable and we neglected to double-check for this issue. Following the reviewer’s recommendation, we have now uploaded the code and dataset to GitHub (accessible at https://github.com/SchordK/Intraspecific-predator-interference-promotesbiodiversity-in-ecosystems), where they are freely available for download.

      The introduction section should include more background, including about BD but also about consumer-resource models. Part of the results section could be moved/edited to the introduction. You should try that the results section should contain only "new" stuff whereas the "old" stuff should go in the introduction.

      We thank the reviewer for these recommendations. Following these suggestions, we have now reorganized our manuscript by adding a new paragraph to the introduction section (lines 51-62 in the main text) and revising related content in both the introduction and results sections (lines 63-67, 81-83 in the main text).

      I found myself getting a little bogged down in the general/formal description of the model before you go to specific cases. I found the most interesting part of the paper to be its second half. This is a dangerous strategy, a casual reader may miss out on the most interesting part of the paper. It's your paper and do what you think is best, but my opinion is that you could improve the presentation of the model and background to get to the specific contribution and specific use case quickly and easily, then immediately to the data. You can leave the more general formulation and the details to later in the paper or even the appendix. Ultimately, you have a simple idea and a beautiful application on interesting data-that is your strength I think, and so, I would focus on that.

      We appreciate the reviewer for the positive comments and valuable suggestions. Following these recommendations, we have revised the presentation of the background information to clarify the contribution of our manuscript, and we have refined our model presentation to enhance clarity. Meanwhile, as we need to address the concerns raised by other reviewers, we continue to maintain systematic investigations for scenarios involving different forms of pairwise encounters in the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1 before applying our model to the experimental data.

      Reviewer #2 (Recommendations For The Authors):

      (1) I believe the surfaces in Figs. 1F-H corresponds to the zero-growth isoclines. The authors should directly point it out in the figure captions and text descriptions.

      We thank the reviewer for this suggestion, and we have followed it to address the issue.

      (2) After showing equations 1 or 2, I believe it will help readers understand the mechanism of equations by adding text such as "(see Fig. 1B)" to the sentences following the equations.

      We appreciate the reviewer's suggestion, and we have implemented it to address the issue.

      (3) Lines 12, 129 143 & 188: "at steady state" -> "at a steady state"

      (4) Line 138: "is doom to extinct" -> "is doomed to extinct"

      (5) Line 170: "intraspecific interference promotes species coexistence along with stochasticity" -> "intraspecific interference still robustly promotes species coexistence when stochasticity is considered"

      (6) Line 190: "The long-term coexistence behavior are exemplified" -> "The long-term coexistence behavior is exemplified"

      (7) Line 227: "the coefficient of variation was taken round 0.3" -> "the coefficient of variation was taken around 0.3"?

      (8) Line 235: "tend to extinct" -> "tend to be extinct"

      We thank the reviewer for all these suggestions, and we have implemented each of them to revise our manuscript.

      Reviewer #3 (Recommendations For The Authors):

      I think this would be a much more useful paper if the authors focused on how the behavior of this model differs from existing models rather than showing that the new formation also generates the same dynamics as the existing theory.

      We thank the reviewers for this suggestion, and we apologize for not explaining the limitations of the B-D model and the related studies on the topic of CEP clearly enough at the time of our initial submission. As we have explained in the responses above, we have now revised the introduction part of our manuscript (lines 5167 in the main text) to make it clear that since the functional response in the B-D model can be derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals, while we have demonstrated that a scenario involving only chasing pairs is under the constraint of CEP, it is thus highly questionable regarding the validity of the studies relying on the B-D model to break CEP or explain the paradox of the plankton. Consequently, one of the major objectives of our manuscript is to resolve whether the mechanism of intraspecific interference can truly break CEP and explain the paradox of the plankton in a rigorous manner. By modeling from a mechanistic perspective, we resolve the above issues and quantitatively illustrate a broad spectrum of experimental results, including two classical experiments that violate CEP and the rank-abundance curves across diverse ecological communities.

      Things that would be of interest:

      What are the conditions for coexistence in this model? Presumably, it depends heavily on the equilibrium abundances of the consumers and resources as well as the engagement times/rates.

      We thank the reviewer for raising this question. We have shown that there is a wide range of parameter space for species coexistence in our model. Specifically, for the case involving two consumer species and one resource species (S<sub>C</sub> = 2 and S<sub>R</sub> \= 1), we have conducted a systematic study on the parameter region for promoting species coexistence. For clarity, we set the mortality rate 𝐷<sub>i</sub> (i = 1, 2) as the only parameter that varies with the consumer species, and the order of magnitude of all model parameters was estimated from behavioral data. The results for scenarios involving intraspecific predator interference are shown in Appendix-figs. 4B-D, 5A, 6C-D and we redraw some of them here as Fig. R2, including both ODEs and SSA results, wherein Δ = (𝐷<sub>1</sub>-𝐷<sub>2</sub>)/ 𝐷<sub>2</sub> represents the competitive difference between the two consumer species. For example, Δ =1 means that species C2 is twice the competitiveness of species C<sub>1</sub>. In Fig. R2 (see also Appendix-figs. 4B-D, 5A, 6C-D), we see that the two consumer species can coexist with a large competitive difference in either ODEs and SSA simulation studies.

      Author response image 2.

      The parameter region for two consumer species coexisting with one type of abiotic resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). (A) The region below the blue surface and above the red surface represents stable coexistence of the three species at constant population densities. (B) The blue region represents stable coexistence at a steady state for the three species. (C) The color indicates (refer to the color bar) the coexisting fraction for long-term coexistence of the three species. Figure redrawn from Appendixfigs. 4B, 6C-D.

      For systems shown in Fig. 3A-D, where the number of consumer species is much larger than that of the resource species, we set each consumer species with unique competitiveness through a distinctive 𝐷<sub>i</sub> (i =1,…, S<sub>C</sub>). In Fig. 3A-D (see also Appendix fig. 10), we see that hundreds of consumer species may coexist with one or three types of resources when the coefficient of variation (CV) of the consumer species’ competitiveness was taken around 0.3, which indicates a large parameter region for promoting species coexistence.

      Is there existing data to estimate the parameters in the model directly from behavioral data? Do these parameter ranges support the hypothesis that predator interference is significant enough to allow for the coexistence of natural predator populations?

      We appreciate the reviewer for raising this question. Indeed, the parameters in our model were primarily determined by estimating their reasonable range from behavioral data. Following the reviewer's suggestions, we have now specified the data we used to set the parameters. For instance, in Fig. 2D, we set 𝐷<sub>2</sub>\=0.01 with τ=0.4 Day, resulting in an expected lifespan of Drosophila serrata in our model setting of 𝜏⁄𝐷<sub>2</sub>\= 40 days, which roughly agrees with experimental behavioral data showing that the average lifespan of D. serrata is 34 days for males and 54 days for females (lines 321325 in the appendices; reference: Narayan et al. J Evol Biol. 35: 657–663 (2022)). To account for competitive differences, we set the mortality rate as the only parameter that varies among the consumer species. As specified in the Appendices, the CV of the mortality rate is the only parameter that was used to fit the experiments within the range of 0.15-0.43. This parameter range (i.e., 0.15-0.43) was directly estimated from experimental data in the reference article (Patricia Menon et al., Water Research 37, 4151(2003)) using the two-sigma rule (lines 344-347 in the appendices).

      Given the high consistency between the model results and experiments shown in Figs. 2D-E and 3C-D, where all the key model parameters were estimated from experimental data in references, and considering that the rank-abundance curves shown in Fig. 3C-D include a wide range of ecological communities, there is no doubt that predator interference is significant enough to allow for the coexistence of natural predator populations within the parameter ranges estimated from experimental references.

      Bifurcation analyses for the novel parameters of this model. Does the fact that prey can escape lead to qualitatively different model behaviors?

      Author response image 3.

      Bifurcation analyses for the separate rate d’<sub>i</sub> and escape rate d<sub>i</sub> (i =1, 2) of our model in the case of two consumer species competing for one abiotic resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). (A) A 3D representation: the region above the blue surface signifies competitive exclusion where C<sub>1</sub> species extinct, while the region below the blue surface and above the red surface represents stable coexistence of the three species at constant population densities. (B) a 2D representation: the blue region represents stable coexistence at a steady state for the three species. Figure redrawn from Appendix-fig. 4C-D.

      We appreciate the reviewer for this suggestion. Following this suggestion, we have conducted bifurcation analyses for the separate rate d’<sub>i</sub> and escape rate d<sub>i</sub> of our model in the case where two consumer species compete for one resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). Both 2D and 3D representations of these results have been included in Appendix-fig. 4, and we redraw them here as Fig. R3. In Fig. R3, we set the mortality rate 𝐷<sub>i</sub> (i =1, 2) as the only parameter that varies between the consumer species, and thus Δ = _(D1-𝐷<sub>2</sub>)/𝐷<sub>2</sub> represents the competitive difference between the two species.

      As shown in Fig. R3A-B, the smaller the escape rate d<sub>i</sub>, the larger the competitive difference Δ tolerated for species coexistence at steady state. A similar trend is observed for the separate rate d’<sub>i</sub>. However, there is an abrupt change for both 2D and 3D representations at the area where d’<sub>i</sub> =0, since if d’<sub>i</sub> =0, all consumer individuals would be trapped in interference pairs, and then no consumer species could exist. On the contrary, there is no abrupt change for both 2D and 3D representations at the area where d<sub>i</sub>\=0, since even if d<sub>i</sub>\=0, the consumer individuals could still leave the chasing pair through the capture process.

      Figures: I found the 3D plots especially Appendix Figure 2 very difficult to interpret. I think 2D plots with multiple lines to represent predator densities would be more clear.

      We thank the reviewer for this suggestion. Following this suggestion, we have added a 2D diagram to Appendix-fig. 2.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odorevoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, directly impacts PN excitability, and uniformly enhances PN responses to odors.

      Weaknesses:

      The one remaining issue to be resolved is the theoretical discrepancy between the physiology and the behavior. The authors provide a computational model that could explain this discrepancy and provide the caveat that while the physiological data was collected from the antennal lobe, but there could be other olfactory processing stages involved. Indeed other processing stages could be the sites for the computational functions proposed by the model. There is an additional caveat which is that the physiological data were collected 5-10 minutes after serotonin application whereas the behavioral data were collected 3 hours after serotonin application. It is difficult to link physiological processes induced 5 minutes into serotonin application to behavioral consequences 3 hours subsequent to serotonin application. The discrepancy between physiology and behavior could easily reflect the timing of action of serotonin (i.e. differences between immediate and longer-term impact).

      For our behavioral experiments, we waited 3 hours after serotonin injection to allow serotonin to penetrate through the layers of air sacks and the sheath, and for the locusts to calm down and recover their baseline POR activity levels. For the physiology experiments, we noticed that the quality of the patch decreased over time after serotonin introduction. Hence, it was difficult to hold cells for that long. However, the point raised by the reviewer is well-taken. We have performed additional experiments to show that the changes in POR levels to different odorants are rapid and can be observed within 15 minutes of injecting serotonin (Author response image 2) and that the physiological changes in PNs (bursting spontaneous activity, maintenance of temporal firing patterns, and increase odor-evoked responses) persists when the cells are held for longer duration (i.e. 3 hours akin to our behavioral experiments). It is worth noting that 3-hour in-vivo intracellular recordings are not easily achievable and come with many experimental constraints. So far, we have managed to record from two PNs that were held for this long and add them to this rebuttal to support our conclusions. (Author response image 1).

      Author response image 1.

      Spontaneous and odor-evoked responses in individual PNs remain consistent for three hours after serotonin introduction into the recording chamber/bath.<br /> (A) Representative intracellular recording showing membrane potential fluctuations in a projection neuron (PN) in the antennal lobe. Spontaneous and odor-evoked responses to four odorants (pink color bars, 4 s duration) are shown before (control) and after serotonin application (5HT). Voltage traces 30 minutes (30min), 1 hour (1h), 2 hours (2h), and 3 hours (3h) after 5HT application are shown to illustrate the persisting effect of serotonin during spontaneous and odor-evoked activity periods.<br /> (B) Rasterized spiking activities in two recorded PNs are shown. Spontaneous and odor-evoked responses are shown in all 5 consecutive trials. Note that the odor-evoked response patterns are maintained, but the spontaneous activity patterns are altered after serotonin introduction.

      Author response image 2.

      Palp-opening response (POR) patterns to different odorants remain consistent following serotonin introduction. The probability of PORs is shown as a bar plot for four different odorants; hexanol (green), benzaldehyde (blue), linalool (red), and ammonium (purple). PORs before serotonin injection (solid bars) are compared against response levels after serotonin injection (striped bars). As can be noted, PORs to the four odorants remain consistent when tested 15 minutes and 3 hours after (5HT) serotonin injection.

      Overall, the study demonstrates the impact of serotonin on odor-evoked responses of PNs and odor-guided behavior in locusts. Serotonin appears to have non-linear effects including changing the firing patterns of PNs from monotonic to bursting and altering behavioral responses in an odor-specific manner, rather than uniformly across all stimuli presented.

      We thank the reviewer for again providing very useful feedback for improving our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odor-specific way. In physiology experiments, they can show that projection neurons in the antennal lobe generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odor-specific changes in behavior.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of projection neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla.

      Weaknesses:

      I still have several concerns regarding the generalizability of the model and interpretation of results. The authors cannot provide evidence that serotonin modulation of projection neurons impacts behavior.

      This is true and likely to be true for any study linking neural responses to behavior. There are multiple circuits and pathways that would get impacted by a neuromodulator like serotonin. What we showed with our physiology is how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Given the specificity of the changes in behavioral outcomes (i.e. odor-specific increase and decrease in an appetitive behavior) and non-specificity in the changes at the level of individual PNs (general increase in odor-evoked spiking activity), we presented a relatively simple computational model to address the apparent mismatch between neural and behavioral responses. (Author response image 4).

      The authors show that odor identity is maintained after 5-HT injection, however, the authors do not show if PN responses to different odors were differently affected after serotonin exposure.

      The PN responses to different odorants changed in a qualitatively similar fashion. (Author response image 3)

      Author response image 3.

      PN activity before and after 5HT application are compared for different cellodor combinations. As can be noted, the changes are qualitatively similar in all cases. After 5HT application, the baseline activity became more bursty, but the odor-evoked response patterns were robustly maintained for all odorants.

      Regarding the model, the authors show that the model works for odors with non-overlapping PN activation. However, only one appetitive, one neutral, and one aversive odor has been tested and modeled here. Can the fixed-weight model also hold for other appetitive and aversive odors that might share more overlap between active PNs? How could the model generate BZA attraction in 5-HT exposed animals (as seen in behavior data in Figure 1) if the same PNs just get activated more?

      Author response image 4.

      Testing the generality of the proposed computational model. To test the generality of the model proposed we used a published dataset [Chandak and Raman, 2023]: Neural dataset – 89 PN responses to a panel of twenty-two odorants; Behavioral dataset – probability of POR responses to the same twenty-two odorants. We built the model using just the three odorants overlapping between the two datasets: hexanol, benzaldehyde and linalool. The true probability of POR values of the twenty odorants and the POR probability predicted by the model are shown for all twenty-two odorants as a scatter plot. As can be noted, there is a high correlation (0.79) between the true and the predicted values.

      The authors should still not exclude the possibility that serotonin injections could affect behavior via modulation of other cell types than projection neurons. This should still be discussed, serotonin might rather shut down baseline activation of local inhibitory neurons - and thus lead to the interesting bursting phenotypes, which can also be seen in the baseline response, due to local PN-to-LN feedback.

      As we agreed, there could be other cells that are impacted by serotonin release. Our goal in this study was to characterize how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Within this circuit, there are local inhibitory neurons (LNs), as correctly indicated by this reviewer. Surprisingly, our preliminary data indicates that LNs are not shut down but also have an enhanced odor-evoked neural response. (Author response image 5.) Further data would be needed to verify this observation and determine the mechanism that mediate the changes in PN excitability. Irrespective, since PN activity should incorporate the effects of changes in the local neuron responses and is the sole output from the antennal lobe that drives all downstream odor-evoked activity, we focused on them in this study.

      Author response image 5.

      Representative traces showing intracellular recording from a local neuron in the antennal lobe. Five consecutive trials are shown. Note that LNs in the locust antennal lobe are non-spiking. The LN activity before, during, and after the presentation of benzaldehyde and hexanol (colored bar; 4s) are shown. The Left and Right panels show LN activity before and after the application of 5HT. As can be noted, 5HT did not shut down odor-evoked activity in this local neuron.

      The authors did not fully tone down their claims regarding causality between serotonin and starved state behavioral responses. There is no proof that serotonin injection mimics starved behavioral responses.

      Specific minor issues:<br /> It is still unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium). The new method part does not indicate the concentrations of odors used for electrophysiology.

      All odorants were diluted to 0.01-10% concentration by volume in either mineral oil or distilled water. This information is included in the Methods section. For most odorants used in the study, the lower concentrations only evoked a very weak neural response, and the higher concentrations evoked more robust responses. The POR responses for these odorants at various concentrations chosen are included in Figure 2. Note, that the responses to linalool and ammonium remained weak throughout the concentration changes, compared to hexanol and benzaldehyde.

      Did all tested PNs respond to all odorants?

      No, only a subset of them responses to each odorant. These responses have been well characterized in earlier publications [included refs].

      The authors do not show if PN responses to different odors were differently affected after serotonin exposure. They describe that ON responses were robust, but OFF responses were less consistent after 5-HT injection. Was this true across all odors tested? Example traces are shown, but the odor is not indicated in Figure 4A. Figure 4D shows that many odor-PN combinations did not change their peak spiking activity - was this true across odorants? In Figure 5 - are PNs ordered by odor-type exposure?

      Also, Figure 6A only shows example trajectories for odorants - how does the average look? Regarding the data used for the model - can the new dataset from the 82 odor-PN pairs reproduce the activation pattern of the previously collected dataset of 89 pairs?

      What is shown in Figure 6A is the trial-averaged response trajectory combining activities of all 82 odor-PN pairs. 82 odor-PN pair was collected intracellularly examining the responses to four odorants before and after 5HT application. The second dataset involving 89 PN responses to 22 odorants was collected extracellularly. They have qualitative similarities in each odorant activate a unique subset of those neurons.

      The authors toned down their claims that serotonin injection can mimic the starved state behavioral response. However, some sentences still indicate this finding and should also be toned down:

      last sentence of introduction - "In sum, our results provide a more systems-level view of how a specific neuromodulator (serotonin) alters neural circuits to produce flexible behavioral outcomes."

      We believe we showed this with our computational model, how uniform changes in the neural responses could lead to variable and odor-specific changes in behavioral PORs.

      discussion: "Finally, fed locusts injected with serotonin generated similar appetitive responses to food-related odorants as starved locusts indicating the role of serotonin in hunger statedependent modulation of odor-evoked responses." This claim is not supported.

      Figure 7 shows that the fed locusts had lower POR to hex and bza. The POR responses significantly increased after the 5HT application. However, we have rephrased this sentence to limit our claims to this result. "Finally, fed locusts injected with serotonin generated similar appetitive palp-opening responses to food-related odorants as observed in starved locusts”

      last results: "However, consistent with results from the hungry locusts, the introduction of serotonin increased the appetitive POR responses to HEX and BZA. Intriguingly, the appetitive responses of fed locusts treated with 5HT were comparable or slightly higher than the responses of hungry locusts to the same set of odorants."

      Again this sentence simply describes the result shown in Figure 7.

      In Figure 7 - BZA response seems unchanged in hungry and fed animals and only 5-HT injection enhances the response. There is only one example where 5-HT application and starvation induce the same change in behavior - N=1 is not enough to conclude that serotonin influences food-driven behaviors.

      The reviewer is ignoring the lack of changes to PORs to linalool and ammonium. Taken together, serotonin increased PORs to only two of the four odorants in starved locusts. The responses after 5HT modulation to these four odorants were similar in fed locusts treated with 5HT and starved locusts.

      Also, this seems to be wrongly interpreted in Figure 7: "It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, remained unchanged in fed locusts treated with 5HT." The authors indicate a significant reduction in POR after 5-HT injection on LOOL response in Figure 7.

      Revised.<br /> It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, and reduced in fed locusts treated with 5HT."

      Also, the newly added sentence at the end of the discussion does not make sense: "However, since 5HT increased behavioral responses in both fed and hungry locusts, the precise role of 5HT modulation and whether it underlies hunger-state dependent modulation of appetitive behavior still remains to be determined."<br /> The authors did not test 5-HT injection in starved animals

      The results shown in Figure 1 compare the POR responses of starved locusts before and after 5HT introduction.

      We again thank the reviewer for useful feedback to further improve our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odor-evoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, and uniformly enhances PN responses to odors. Overall, I had no technical concerns. Weaknesses:

      While there are several interesting observations, the conclusions that serotonin enhanced sensitivity specifically and that serotonin had feeding-state-specific effects, were not supported by the evidence provided. Furthermore, there were other instances in which much more clarification was needed for me to follow the assumptions being made and inadequate statistical testing was reported.

      Major concerns.

      • To enhance olfactory sensitivity, the expected results would be that serotonin causes locusts to perceive each odor as being at a relatively higher concentration. The authors recapitulate a classic olfactory behavioral phenomenon where higher odor concentrations evoke weaker responses which is indicative of the odors becoming aversive. If serotonin enhanced the sensitivity to odors, then the dose-response curve should have shifted to the left, resulting in a more pronounced aversion to high odor concentrations. However, the authors show an increase in response magnitude across all odor concentrations. I don't think the authors can claim that serotonin enhances the behavioral sensitivity to odors because the locusts no longer show concentration-dependent aversion. Instead, I think the authors can claim that serotonin induces increased olfactory arousal.

      The reviewer makes a valid point. Bath application of serotonin increased POR behavioral responses across all odor concentrations, and concentration-dependent aversion was also not observed. Furthermore, the monotonic relationship between projection neuron responses and the intensity of current injection is altered when serotonin is exogenously introduced (see Author response image 1; see below for more explanation). Hence, our data suggests that serotonin alters the dose-response relationship between neural/behavioral responses and odor intensity. As recommended, we have followed what the reviewer has suggested and revised our claim to serotonin inducing increase in olfactory arousal. The new physiology data has been added as Supplementary Figure 3 to the revised manuscript.

      • The authors report that 5-HT causes PNs to change from tonic to bursting and conclude that this stems from a change in excitability. However, excitability tests (such as I/V plots) were not included, so it's difficult to disambiguate excitability changes from changes in synaptic input from other network components.

      To confirm that the PN excitability did indeed change after serotonin application, we performed a new set of current-clamp recordings. In these experiments, we monitored the spiking activities in individual PNs as we injected different levels of current injections (200 – 1000 pico Amperes). Note that locust LNs that provide recurrent inhibition arborize and integrate inputs from a large number of sensory neurons and projection neurons. Therefore, activating a single PN should not activate the local neurons and therefore the antennal lobe network.

      We found that the total spiking activity monotonically increased with the magnitude of the current injection in all four PNs recorded (Author response image 1). However, after serotonin injection, we found that the spiking activity remained relatively stable and did not systematically vary with the magnitude of the current injection. While the changes in odor-evoked responses may incorporate both excitability changes in individual PNs and recurrent feedback inhibition through GABAergic LNs, these results from our current injection experiments unambiguously indicate that there are changes in excitability at the level of individual PNs. We have added this result to the revised manuscript.

      Author response image 1.

      Current-injection induced spiking activity in individual PNs is altered after serotonin application. (A) Representative intracellular recordings showing membrane potential fluctuations as a function of time for one projection neuron (PNs) in the locust antennal lobe. A two-second window when a positive 200-1000pA current was applied is shown. Firing patterns before (left) and after (right) serotonin application are shown for comparison. Note, the spiking activity changes after the 5HT application. The black bar represents the 20mV scale. (B) Dose-response curves showing the average number of action potentials (across 5 trials) during the 2second current pulse before (green) and after (purple) serotonin for each recorded PN. Note that the current intensity was systematically increased from 200 pA to 1000 pA. The (C) The mean number of spikes across the four recorded cells during current injection is shown. The color progression represents the intensity of applied current ranging 200pA (leftmost bar) to 1000pA (rightmost bar). The dose-response trends before (green) and after (purple) 5HT application are shown for comparison.. The error bars represent SEM across the four cells.

      • There is another explanation for the theoretical discrepancy between physiology and behavior, which is that odor coding is further processing in higher brain regions (ie. Other than the antennal lobe) not studied in the physiological component of this study. This should at least be discussed.

      This is a valid argument. For our model of neural mapping onto behavior to work, we only need the odorant that evokes or suppresses PORs to activate a distinct set of neurons. Having said that, our extracellular recording results (Fig. 6E) indicate that hexanol (high POR) and linalool (low POR) do activate highly non-overlapping sets of PNs in the antennal lobe. Hence, our results suggest that the segregation of neural activity based on behavioral relevance already begins in the antennal lobe. We have added this clarification to the discussion section.

      • The authors cannot claim that serotonin underlies a hunger state-dependent modulation, only that serotonin impacts responses to appetitive odors. Serotonin enhanced PORs for starved and fed locusts, so the conclusion would be that serotonin enhances responses regardless of the hunger state. If the authors had antagonized 5-HT receptors and shown that feeding no longer impacts POR, then they could make the claim that serotonin underlies this effect. As it stands, these appear to be two independent phenomena.

      This is also a valid point. We have clarified this in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odorspecific way. In physiology experiments, they can show that antennal lobe neurons generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odorspecific changes in behavior. The authors finally suggest that serotonin injection can mimic a change in a hunger state.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of antennal lobe neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla. Weaknesses:

      I have several concerns regarding missing control experiments, unclear data analysis, and interpretation of results.

      A detailed description of the behavioral experiments is lacking. Did the authors also provide a mineral oil control and did they analyze the baseline POR response? Is there an increase in baseline response after serotonin exposure already at the behavioral output level? It is generally unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium).

      POR protocol: Sixth instar locusts (Schistocera americana) of either sex were starved for 24-48 hours before the experiment or taken straight from the colony and fed blades of grass for the satiated condition. Locusts were immobilized by placing them in the plastic tube and securing their body with black electric tape (see Author response image 2). Locusts were given 20 - 30 minutes to acclimatize after placement in the immobilization tube. As can be noted, the head of the locusts along with the antenna and maxillary palps protruded out of this immobilization tube so they can be freely moved by the locusts. Note that the maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process.

      It is worth noting that our earlier studies had shown that the presentation of ‘appetitive odorants’ triggers the locust to open their maxillary palps even when no food is presented (Saha et al., 2017; Nizampatnam et al., 2018; Nizampatnam et al., 2022; Chandak and Raman, 2023.) Furthermore, our earlies results indicate that the probability of palp opening varies across different odorants (Chandak and Raman, 2023). We chose four odorants that had a diverse range of palp-opening: supra-median (hexanol), median (benzaldehyde), and sub-median (linaool). Therefore, each locust in our experiments was presented with one concentration of four odorants (hexanol, benzaldehyde, linalool, and ammonium) in a pseudorandomized order. The odorants were chosen based on our physiology results such that they evoked different levels of spiking activities.

      The odor pulse was 4 s in duration and the inter-pulse interval was set to 60 s. The experiments were recorded using a web camera (Microsoft) placed right in front of the locusts. The camera was fully automated with the custom MATLAB script to start recording 2 seconds before the odor pulse and end recording at odor termination. An LED was used to track the stimulus onset/offset. The POR responses were manually scored offline. Responses to each odorant were scored a 0 or 1 depending on if the palps remained closed or opened. A positive POR was defined as a movement of the maxillary palps during the odor presentation time window as shown on the locust schematic (Main Paper Figure 1).

      Author response image 2.

      Pictures showing the behavior experiment setup and representative palp-opening responses in a locust.

      As the reviewer inquired, we performed a new series of POR experiments, where we explored POR responses to mineral oil and hexanol, before and after serotonin injection. For this study, we used 10 locusts that were starved 24-48 hours before the experiment. Note that hexanol was diluted at 1% (v/v) concentration in mineral oil. Our results reveal that locusts PORs to hexanol (~ 50% PORs) were significantly higher than those triggered by mineral oil (~10% PORs). Injection of serotonin increased the POR response rate to hexanol but did not alter the PORs evoked by mineral oil (Author response image 3).

      Author response image 3.

      Serotonin does not alter the palp-opening responses evoked by paraffin oil. The PORs before and after (5HT) serotonin injection are summarized and shown as a bar plot for hexanol and paraffin oil. Striped bars signify the data collected after 5HT injection. Significant differences are identified in the plot (one-tailed paired-sample t-test; (*p<0.05).

      Regarding recordings of potential PNs - the authors do not provide evidence that they did record from projection neurons and not other types of antennal lobe neurons. Thus, these claims should be phrased more carefully.

      In the locust antennal lobe, only the cholinergic projection neurons fire full-blown sodium spikes. The GABAergic local neurons only fire calcium ‘spikelets’ (Laurent, TINS, 1996; Stopfer et al., 2003; see Author response image 4 for an example). Hence, we are pretty confident that we are only recording from PNs. Furthermore, due to the physiological properties of the LNs, their signals being too small, they are also not detected in the extracellular recordings from the locust antennal lobe. Hence, we are confident with our claims and conclusion.

      Author response image 4.

      PN vs LN physiological differences: Left: A representative raw voltage traces recorded from a local neuron before, during, and after a 4-second odor pulse are shown. Note that the local neurons in the locust antennal lobe do not fire full-blown sodium spikes but only fire small calcium spikelets. On the right: A representative raw voltage trace recorded from a representative projection neuron is shown for comparison. Clear sodium spikes are clearly visible during spontaneous and odor-evoked periods. The gray bar represents 4 seconds of odor pulse. The vertical black bar represents the 40mV.

      The presented model suggests labeled lines in the antennal lobe output of locusts. Could the presented model also explain a shift in behavior from aversion to attraction - such as seen in locusts when they switch from a solitarious to a gregarious state? The authors might want to discuss other possible scenarios, such as that odor evaluation and decision-making take place in higher brain regions, or that other neuromodulators might affect behavioral output. Serotonin injections could affect behavior via modulation of other cell types than antennal lobe neurons. This should also be discussed - the same is true for potential PNs - serotonin might not directly affect this cell type, but might rather shut down local inhibitory neurons.

      There are multiple questions here. First, regarding solitary vs. gregarious states, we are currently repeating these experiments on solitary locusts. Our preliminary results (not included in the manuscript) indicate that the solitary animals have increased olfactory arousal and respond with a higher POR but are less selective and respond similarly to multiple odorants. We are examining the physiology to determine whether the model for mapping neural responses onto behavior could also explain observations in solitary animals.

      Second, this reviewer makes the point raised by Reviewer 1. We agree that odor evaluation and decisionmaking might take place in higher brain regions. All we could conclude based on our data is that a segregation of neural activity based on behavioral relevance might provide the simplest approach to map non-specific increase in stimulus-evoked neural responses onto odor-specific changes in behavioral outcome. Furthermore, our results indicate that hexanol and linalool, two odorants that had an increase and decrease in PORs after serotonin injection, had only minimal neural response overlap in the antennal lobe. These results suggest that the formatting of neural activity to support varying behavioral outcomes might already begin in the antennal lobe. We have added this to our discussion.

      Third, regarding serotonin impacting PNs, we performed a new set of current-clamp experiments to examine this issue (Author response image 1). Our results clearly show that projection neuron activity in response to current injections (that should not incorporate feedback inhibition through local neurons) was altered after serotonin injection. Therefore, the observed changes in the odor-evoked neural ensemble activity should incorporate modulation at both individual PN level and at the network level. We have added this to our discussion as well.

      Finally, the authors claim that serotonin injection can mimic the starved state behavioral response. However, this is only shown for one of the four odors that are tested for behavior (HEX), thus the data does not support this claim.

      We note that Hex is the only appetitive odorant in the panel. But, as reviewer 1 has also brought up a similar point, we have toned down our claims and will investigate this carefully in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Was the POR of the locusts towards linalool and ammonium higher than towards a blank odor cartridge? I ask because the locusts appear to be less likely to respond to these odors and so I am concerned that this assay is not relevant to the ecological context of these odors. In other words, perhaps serotonin did not enhance the responses to these odors in this assay, because this is not a context in which locusts would normally respond to these odors.

      The POR response to linalool and ammonium is lower and comparable to that of paraffin oil. Serotonin does not increase POR responses to paraffin oil but does increase response to hexanol (an appetitive odorant). We have clarified this using new data (Author response image 5).

      • It seems to me that Figure 5C is the crux for understanding the potential impact of 5-HT on odor coding, but it is somewhat confusing and underutilized. Is the implication that 5-HT decorrelates spontaneous activity such that when an odor stimulus arrives, the odor-evoked activity deviates to a greater degree? The authors make claims about this figure that require the reader to guess as to the aspect of the figure to which they are referring.

      The reviewer makes an astute observation. Yes, the spontaneous activity in the antennal lobe network before serotonin introduction is not correlated with the ensemble spontaneous activity after serotonin bath application. Remarkably, the odor-evoked responses were highly similar, both in the reduced PCA space and when assayed using high-dimensional ensemble neural activity vectors. Whether the changes in network spontaneous activity have a function in odor detection and recognition is not fully understood and cannot be convincingly answered using our data. But this is something that we had pondered.

      • The modeling component summarized in Figure 6 needs clarification and more detail. Perhaps example traces associated with positive weighting within neural ensemble 1 relative to neural ensemble 2? I struggled to understand conceptually how the model resolved the theoretical discrepancy between physiology and behavior.

      As recommended, here is a plot showing the responses of four PNs that had positive weights to hexanol and linalool. As can be expected, each PN in this group had higher responses to hexanol and no response to linalool. Further, the four PNs that received negative weights had response only to linalool.

      Author response image 5.

      Odor-evoked responses of four PNs that received positive weights in the model (top panel), and four PNs that were assigned negative weights in the model (bottom).

      • Was there a significant difference between the PORs of hungry vs. fed locusts? The authors state that they differ and provide statistics for the comparisons to locusts injected with 5-HT, but then don't provide any statistical analyses of hungry vs. fed animals.

      The POR responses to HEX (an appetitive odorant) were significantly different between the hungry and starved locusts.

      Author response image 6.

      A bar plot summarizing PORs to all four odors for satiated locust (highlighted with stripes), before (dark shade), and after 5HT injection (lighter shade). To allow comparison before 5HT injection for starved locust plotted as well (without stripes). The significance was determined using a one-tailed paired-sample ttest(*p<0.05).

      • Were any of the effects of 5-HT on odor-evoked PN responses significant? No statistics are provided.

      We examined the distribution of odor-evoked responses in PNs before and after 5HT introduction. We found that the overall distribution was not significantly different between the two (one-tailed pairedsample t-test; p = 0.93).

      Author response image 7.

      Comparison of the distribution of odor-evoked PN responses before (green) and after (purple) 5HT introduction. One-tailed paired sample t-test was used to compare the two distributions.

      • The authors interchangeably use "serotonin", "5HT" and "5-HT" throughout the manuscript, but this should be consistent.

      This has been fixed in the revised manuscript.

      • On page 2 the authors provide an ecological relevance for linalool as being an additive in pesticides, however, linalool is a common floral volatile chemical. Is the implication that locusts have learned to associate linalool with pesticides?

      Linalool is a terpenoid alcohol that has a floral odor but has also been used as a pesticide and insect repellent [Beier et al., 2014]. As shown in Author response image 2, it evoked the least POR responses amongst a diverse panel of 22 odorants that were tested. We have clarified how we chose odorants based on the prior dataset in the Methods section.

      • In Figure 1, there should be a legend in the figure itself indicating that the black box indicates the absence of POR and the white box indicates presence, rather than just having it in the legend text.

      Done.

      • In Figure 2, the raw data from each animal can be moved to the supplements. The way it is presented is overwhelming and the order of comparisons is difficult to follow.

      Done.

      • For the induction of bursting in PNs by the application of 5-HT, were there any other metrics observed such as period, duration of bursts, or peak burst frequency? The authors rely on ISI, but there are other bursting metrics that could also be included to understand the nature of this observation. In particular, whether the bursts are likely due to changes in intrinsic biophysical properties of the PNs or polysynaptic effects.

      We could use other metrics as the reviewer suggests. Our main point is that the spontaneous activity of individual PNs changed. We have added a new current-injection experiments to show that the PNs output to square pulses of current becomes different after serotonin application (Author response image 1)

      • Were 4-vinyl anisole, 1-nonanol, and octanoic acid selected as additional odors because they had particular ecological relevance, or was it for the diversity of chemical structure?

      These odorants were selected based on both, chemical structure and ecological relevance. The logic behind this was to have a very diverse odor panel that consisted of food odorant – Hexanol, aggregation pheromone – 4-vinyl anisole, sex pheromone – benzaldehyde, acid – octanoic acid, base – ammonium, and alcohol – 1-nonanol. Additionally, we selected these odors based on previous neural and behavioral data on these odorants (Chandak and Raman, 2023, Traner and Raman, 2023, Nizampatnam et al, 2022 & 2018; Saha et al., 2017 & 2013).

      Reviewer #2 (Recommendations For The Authors):

      The electrophysiology dataset combines all performed experiments across all tested different PN-odor pairs. How many odors have been tested in a single PN and how many PNs have been tested for a single odor? This information is not present in the current manuscript. Can the authors exclude that there are odor-specific modulations?

      In total, our dataset includes recordings from 19 PNs. Seven PNs were tested on a panel of seven odorants (4-vinyl anisole, 1-nonanol, octanoic acid, Hex, Bza, Lool, and Amn), and the remaining twelve were tested with the four main odorants used in the study (Hex, Bza, Lool, and Amn). This information has been added to the Methods section

      How did the authors choose the concentrations of serotonin injections and bath applications - is this a naturalistic amount?

      The serotonin concentration for ephys experiments was chosen based on trial-error experiments:

      0.01mM was the highest concentration that did not cause cell death. For the behavioral experiments, we increased the concentration (0.1 M) due to the presence of anatomical structures in the locust's head such as air sacks, sheath as well as hemolymph which causes some degree of dilution that we cannot control.

      Behavior experiments were performed 3 hours after injection - ephys experiments 5-10 minutes following bath application. Can the authors exclude that serotonin affects neural processing differently on these different timescales?

      We cannot exclude this possibility. We did ePhys experiments 5-10 minutes after bath application as it would be extremely hard to hold cells for that long.

      A longer delay was required for our behavioral experiments as the locusts tended to be a bit more agitated with larger spontaneous movements of palps as well as exhibited unprompted vomiting. A 3hour period allowed the locust to regain its baseline level movements after 5HT introduction. [This information has been added to the methods section of the revised manuscript]

      Concerning the analysis of electrophysiological data. The authors should correct for changes in the baseline before performing PCA analysis. And how much of the variance is explained by PC1 and PC2?

      We did not correct for baseline changes or subtract baseline as we wanted to show that the odor-evoked neural responses still robustly encoded information about the identity of the odorant.

      The authors should perform dye injections after recordings to visualize the cell type they recorded from. Serotonin might affect also other cell types in the antennal lobe.

      As mentioned above, in the locust antennal lobe only PNs fire full-blown sodium spikes, and LNs only fire calcium spikelets (Author response image 4). Since these signals are small, they will be buried under the noise floor when using extracellular recording electrodes for monitoring responses in the AL antennal lobe.

      Hence we are pretty certain what type of cells we are recording from.

      There were several typos in the manuscript, please check again.

      We have fixed many of the grammatical errors and typos in the revised version.

    1. Author response:

      “Overall, the paper has several strengths, including leveraging large-scale, multi-modal datasets, using computational reasonable tools, and having an in-depth discussion of the significant results.”

      We thank the reviewer for the very supportive comments.

      Based on the comments and questions, we have grouped the concerns and corresponding responses into three categories.

      (1) The scope and data selection

      “The results are somewhat inconclusive or not validated.

      The overall results are carefully designed, but most of the results are descriptive. While the authors are able to find additional evidence either from the literature or explain the results with their existing knowledge, none of the results have been biologically validated. Especially, the last three result sections (signaling pathways, eQTLs, and TF binding) further extended their findings, but the authors did not put the major results into any of the figures in the main text.”

      The goal of this manuscript is to provide a list of putative childhood obesity target genes to yield new insights and help drive further experimentation. Moreover, the outputs from signaling pathways, eQTLs, and TF binding, although noteworthy and supportive of our method, were not particularly novel. In our manuscript we placed our focus on the novel findings from the analyses. We did, however, report the part of the eQTLs analysis concerning ADCY3, which brought new insight to the pathology of obesity, in Figure 4C.

      “The manuscript would benefit from an explanation regarding the rationale behind the selection of the 57 human cell types analyzed. it is essential to clarify whether these cell types have unique functions or relevance to childhood development and obesity.”

      We elected to comprehensively investigate the GWAS-informed cellular underpinnings of childhood development and obesity. By including a diverse range of cell types from different tissues and organs, we sought to capture the multifaceted nature of cellular contributions to obesity-related mechanisms, and open new avenues for targeted therapeutic interventions.

      There are clearly cell types that are already established as being key to the pathogenesis of obesity when dysregulated: adipocytes for energy storage, immune cell types regulating inflammation and metabolic homeostasis, hepatocytes regulating lipid metabolism, pancreatic cell types intricately involved in glucose and lipid metabolism, skeletal muscle for glucose uptake and metabolism, and brain cell types in the regulation of appetite, energy expenditure, and metabolic homeostasis.

      While it is practical to focus on cell types already proven to be associated with or relevant to obesity, this approach has its limitations. It confines our understanding to established knowledge and rules out the potential for discovering novel insights from new cellular mechanisms or pathways that could play significant roles in the pathogenesis if obesity. Therefore, it is was essential to reflect known biology against the unexplored cell types to expand our overall understanding and potentially identify innovative targets for treatment or prevention.

      “I wonder whether the used epigenome datasets are all from children. Although the authors use literature to support that body weight and obesity remain stable from infancy to adulthood, it remains uncertain whether epigenomic data from other life stages might overlook significant genetic variants that uniquely contribute to childhood obesity.”

      The datasets utilized in our study were derived from a combination of sources, both pediatric and adult. We recognize that epigenetic profiles can vary across different life stages but our principal effort was to characterize susceptibility BEFORE disease onset.

      “Given that the GTEx tissue samples are derived from adult donors, there appears to be a mismatch with the study's focus on childhood obesity. If possible, identifying alternative validation strategies or datasets more closely related to the pediatric population could strengthen the study's findings.” 

      We thank the reviewer for raising this important point. We acknowledge that the GTEx tissue samples are derived from adult donors, which might not perfectly align with the study's focus on childhood obesity. The ideal strategy would be a longitudinal design that follows individuals from childhood into adulthood to bridge the gap between pediatric and adult data, offering systematic insights into how early-life epigenetic markers influencing obesity later in life. In future work, we aim to carry out such efforts, which will represent substantial time and financial commitment.

      Along the same lines, the Developmental Genotype-Tissue Expression (dGTEx) Project is a new effort to study development-specific genetic effects on gene expression at 4 developmental windows spanning from infant to post-puberty (0-18 years). Donor recruitment began in August 2023 and remains ongoing. Tissue characterization and data production are underway. We hope that with the establishment of this resource, our future research in the field of pediatric health will be further enhanced.

      “Figure 1B: in subplots c and d, the results are either from Hi-C or capture-C. Although the authors use different colors to denote them, I cannot help wondering how much difference between Hi-C and capture-C brings in. Did the authors explore the difference between the Hi-C and capture-C?”.

      Thank you for your comment. It is not within the scope of our paper to explore the differences between the Hi-C and Capture-C methods. In the context of our study, both methods serve the same purpose of detecting chromatin loops that bring putative enhancers to sometimes genomically distant gene promoters. Consequently, our focus was on utilizing these methods to identify relevant chromatin interactions rather than comparing their technical differences.

      (2) Details on defining different categories of the regions of interest

      “Some technical details are missing.

      While the authors described all of their analysis steps, a lot of the time, they did not mention the motivation. Sometimes, the details were also omitted.”

      We will add a section to the revision to address the rationale behind different OCRs categories.

      “Line 129: should "-1,500/+500bp" be "-500/+500bp"? 

      A gene promoter was defined as a region 1,500 bases upstream to 500 bases downstream of the TSS. Most transcription factor binding sites are distributes upstream (5’) from TSS, and the assembly of transcription machinery occurs up to 1000 bases 5’ from TSS. Given our interest in SNPs that can potentially disrupt transcription factor binding, this defined promoter length allowed us to capture such SNPs in our analyses.

      “How did the authors define a contact region?”

      Chromatin contact regions identified by Hi-C or Capture-C assays are always reported as pairs of chromatin regions. The Supplementary eMethods provide details on the method of processing and interaction calling from the Hi-C and Capture-C data.

      “The manuscript would benefit from a detailed explanation of the methods used to define cREs, particularly the process of intersecting OCRs with chromatin conformation data. The current description does not fully clarify how the cREs are defined.”

      “In the result section titled "Consistency and diversity of childhood obesity proxy variants mapped to cREs", the authors introduced the different types of cREs in the context of open chromatin regions and chromatin contact regions, and TSS. Figure 2A is helpful in some way, but more explanation is definitely needed. For example, it seems that the authors introduced three chromatin contacts on purpose, but I did not quite get the overall motivation.”

      We apologize for the confusion. Our definition of cREs is consistent throughout the study. Figure 2A will be the first Figure 1A in the revision in order to aid the reader.

      The 3 representative chromatin loops illustrate different ways the chromatin contact regions (pairs of blue regions under blue arcs) can overlap with OCRs (yellow regions under yellow triangles – ATAC peaks) and gene promoters.

      [1] The first chromatin loop has one contact region that overlaps with OCRs at one end and with the gene promoter at the other. This satisfies the formation of cREs; thus, the area under the yellow ATAC-peak triangle is green.

      [2] The second loop only overlapped with OCR at one end, and there was no gene promoter nearby, so it is unqualified as cREs formation.

      [3] The third chromatin loop has OCR and promoter overlapping at one end. We defined this as a special cRE formation; thus, the area under the yellow ATAC-peak triangle is green.

      To avoid further confusion for the reader, we will eliminate this variation in the new illustration for the revised manuscript.

      “Figure 2A: The authors used triangles filled differently to denote different types of cREs but I wonder what the height of the triangles implies. Please specify.”

      The triangles are illustrations for ATAC-seq peaks, and the yellow chromatin regions under them are OCRs. The different heights of ATAC-seq peaks are usually quantified as intensity values for OCRs. However, in our study, when an ATAC-seq peak passed the significance threshold from the data pipeline, we only considered their locations, regardless of their intensities. To avoid further confusion for the reader, we will eliminate this variation in the new illustration for the revised manuscript.

      “Figure 1B-c. the title should be "OCRs at putative cREs". Similarly in Figure 1B-d.”

      cREs are a subset of OCRs.

      - In the section "Cell type specific partitioned heritability", the authors used "4 defined sets of input genomic regions". Are you corresponding to the four types of regions in Figure 2A? 

      Figure 2A will be the first Figure 1A in the revision and will be modified to showcase how we define OCRs and cREs.

      “It seems that the authors described the 771 proxies in "Genetic loci included in variant-to-genes mapping" (ln 154), and then somehow narrowed down from 771 to 94 (according to ln 199) because they are cREs. It would be great if the authors could describe the selection procedure together, rather than isolated, which made it quite difficult to understand.”

      In the Methods section entitled “Genetic loci included in variant-to-genes mapping," we described the process of LD expansion to include 771 proxies from 19 sentinel obesity-significantly associated signals. Not all of these proxies are located within our defined cREs. Figure 2B, now Figure 2A in the revision, illustrates different proportions of these proxies located within different types of regions, reducing the proxy list to 94 located within our defined cREs.

      “Figure 2. What's the difference between the 771 and 758 proxies? “

      13 out of 771 proxies did not fall within any defined regions. The remaining 758 were located within contact regions of at least one cell type regardless of chromatin state.

      (3) Typos

      “In the paragraph "Childhood obesity GWAS summary statistics", the authors may want to describe the case/control numbers in two stages differently. "in stage 1" and "921 cases" together made me think "1,921" is one number.”

      This will be amended in the revision.

      “Hi-C technology should be spelled as Hi-C. There are many places, it is miss-spelled as "hi-C". In Figure 1, the author used "hiC" in the legend. Similarly, Capture-C sometime was spelled as "capture-C" in the manuscript.”

      “At the end of the fifth row in the second paragraph of the Introduction section: "exisit" should be "exist".

      “In Figure 2A: "Within open chromatin contract region" should be "Within open chromatin contact region". 

      These typos and terminology inconsistencies will be amended in the revision.

    1. Author response:

      Provisional author response to Reviewer #1<br /> We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we plan to address these as good as possible during the revision of our manuscript.<br /> We can already state that miR-26b has clear anti-inflammatory effects on human liver slices, which is in line with our results demonstrating that miR-26b plays a protective role in MASH development in mice. The notion that patients with liver cirrhosis have increasing plasma levels of miR-26b, seems contradictory at first glance. However, we believe that this increased miR-26b expression is a compensatory mechanism to counteract the MASH/cirrhotic effects. However, the exact source of this miR-26b remains to be elucidated in future studies.<br /> The performed kinase activity analysis revealed that miR-26b affects kinases that particularly play an important role in inflammation and angiogenesis. Strikingly and supporting these data, these effects could be inverted again by LNP treatment. Combined, these results already provide strong mechanistic insights on molecular and intracellular signalling level. Although the exact target of miR-26b remains elusive and its identification is probably beyond the scope of the current manuscript due to its complexity, we believe that the kinase activity results already provide a solid mechanistic basis.

      Provisional author response to Reviewer #2<br /> We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we plan to address these as good as possible during the revision of our manuscript. Particularly the validation suggestions are very valuable and we plan to address these in the revision by performing additional experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Komarova et al. investigate the clinical prognostic ability of cell-level metabolic heterogeneity quantified via the fluorescence lifetime characteristics of NAD(P)H. Fluorescence lifetime imaging microscopy (FLIM) has been studied as a minimally invasive approach to measure cellular metabolism in live cell cultures, organoids, and animal models. Its clinical translation is spearheaded through macroscopic implementation approaches that are capable of large sampling areas and enable access to otherwise constrained spaces but lack cellular resolution for a one-to-one transition with traditional microscopy approaches, making the interpretation of the results a complicated task. The merit of this study primarily lies in its design by analyzing with the same instrumentation and approach colorectal samples in different research scenarios, namely in vitro cells, in vivo animal xenografts, and tumor tissue from human patients. These conform to a valuable dataset to explore the translational interpretation hurdles with samples of increasing levels of complexity. For human samples, the study specifically investigates the prediction ability of NAD(P)H fluorescence metrics for the binary classification of tumors of low and advanced stage, with and without metastasis, and low and high grade. They find that NAD(P)H fluorescence properties have a strong potential to distinguish between high- and low-grade tumors and a moderate ability to distinguish advanced-stage tumors from low-stage tumors. This study provides valuable results contributing to the deployment of minimally invasive optical imaging techniques to quantify tumor properties and potentially migrate into tools for human tumor characterization and clinical diagnosis.

      Strengths:

      The investigation of colorectal samples under multiple imaging scenarios with the same instrument and approach conforms to a valuable dataset that can facilitate the interpretation of results across the spectrum of sample complexity.

      The manuscript provides a strong discussion reviewing studies that investigated cellular metabolism with FLIM and the metabolic heterogeneity of colorectal cancer in general.

      The authors do a thorough acknowledgement of the experimental limitations of investigating human samples ex vivo, and the analytical limitation of manual segmentation, for which they provide a path forward for higher throughput analysis.

      Weaknesses:

      To substantiate the changes in fluorescence properties at the examined wavelength range (associated with NAD(P)H fluorescence) in relationship to metabolism, the study would strongly benefit from additional quantification of metabolic-associated metrics using currently established standard methods. This is especially interesting when discussing heterogeneity, which is presumably high within and between patients with colorectal cancer, and could help explain the particularities of each sample leading to a more in-depth analysis of the acquired valuable dataset.

      In order to address this issue, we have performed immunohistochemical staining of the available tumor samples for the two standard metabolic markers GLUT3 and LDHA.

      The results are included in Supplementary (Fig.S4). Discussion has been extended.

      Additionally, NAD(P)H fluorescence does not provide a complete picture of the cell/tissue metabolic characteristics. Including, or discussing the implications of including fluorescence from flavins would comprise a more compelling dataset. These additional data would also enable the quantification of redox metrics, as briefly mentioned, which could positively contribute to the prognosis potential of metabolic heterogeneity.

      We agree with the Reviewer that fluorescence from flavins could be helpful to obtain more complete data on cellular metabolic states. However, we lack to detect sufficiently intensive emission from flavins in colorectal cancer cells and tissues. The paragraph about flavins was added in Discussion and representative images - in Supplementary Material (Figure S5).

      In the current form of the manuscript, there is a diluted interpretation and discussion of the results obtained from the random forest and SHAP analysis regarding the ability of the FLIM parameters to predict clinicopathological outcomes. This is, not only the main point the authors are trying to convey given the title and the stated goals, but also a novel result given the scarce availability of these type of data, which could have a remarkable impact on colorectal cancer in situ diagnosis and therapy monitoring. These data merit a more in-depth analysis of the different factors involved. In this context, the authors should clarify how is the "trend of association" quantified (lines 194 and 199).

      We thank the Reviewer for this suggestion. The section has been updated with SHAP analysis using different parameters (dispersion D of t2, a1, tm and bimodality index BI of t2, a1, tm). It is now more clear that D-a1 is more strongly associated with clinicopathological outcomes compared with other variables. We have also added some biological interpretation of these results in the Discussion.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Metabolic heterogeneity of colorectal cancer as a prognostic factor: insights gained from fluorescence lifetime imaging" by Komarova et al., the authors used fluorescence lifetime imaging and quantitative analysis to assess the metabolic heterogeneity of colorectal cancer. Generally, this work is logically well-designed, including in vitro and in vivo animal models and ex vivo patient samples. However, since the key parameter presented in this study, the BI index, is already published in a previous paper by this group (Shirshin et al., 2022), and the quantification method of metabolic heterogeneity has already been well (and even better) described in previous studies (such as the one by Heaster et al., 2019), the novelty of this study is doubted. Moreover, I am afraid that the way of data analysis and presentation in this study is not well done, which will be mentioned in detail in the following sections.

      Strengths:

      (1) Solid experiments are performed and well-organized, including in vitro and in vivo animal models and ex vivo patient samples.

      (2) Attempt and efforts to build the association between the metabolic heterogeneity and prognosis for colorectal cancer.

      Weaknesses:

      (1) The human sample number (from 21 patients) is very limited. I wonder how the limited patient number could lead to reliable diagnosis and prognosis;.

      Additional 8 samples of patients’ tumors collected while the manuscript was under review were added to the present data. We agree that the number is still limited to conclude about the prognostic value of cell-level metabolic heterogeneity. But at this point we can expect that this parameter will become a metric for prognosis. We will continue this study to collect more samples of colorectal tumors and expand the approach to different cancer types.

      (2) The BI index or similar optical metrics have been well established by this and other groups; therefore, the novelty of this study is doubted.

      The purpose of this research was to quantify and compare the cellular metabolic heterogeneity across the systems of different complexity - commercial cell lines, tumor xenografts and patients’ tumors - using previously established FLIM-based metrics. For the first time, using FLIM, it was shown that heterogeneity of patients’ samples is much higher than of laboratory models and that it has associations with clinical characteristics of the tumors - the stage and the grade. In addition, this study provides evidence that bimodality (BI) in the distribution of metabolic features in the cell population is less important than the width of the spread (the dispersion value D).

      Some corrections have been made in the text on this point.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The following comments should be addressed to strengthen the rigor and clarity of the manuscript.

      (1) The ethical committee that approved the human studies should also be mentioned in the methods section, as was done with the animal studies.

      Information about the ethics committee has been added in the Manuscript.

      The study with the use of patients’ material was approved by the ethics committee of the Privolzhsky Research Medical University (approval № 09 from 30.06.2023).

      (2) The captions in Figures 2 and 3 must be revised. In Figure 2, it seems the last 2 sentences for the description of (C) do not belong there, and instead, the last sentence in the description of (D) may need to be included in (C) instead. Figure 3 is similar.

      The captions were revised.

      (3) From supplement Figure S2 it seems that EpCam and vimentin staining were only done in two of the mouse tumor types. No further mention is made in the results or methods section. Is there any reason this was not performed in the other tumor types? Were the histology and IHC protocols the same for the mouse and human tumors?

      The data on other tumor types and patients’ tumors have been added in Figure S3. Discussion was extended with the following paragraph.

      One of the possible reasons for metabolic heterogeneity could be the presence of stromal cells or diversity of epithelial and mesenchymal phenotypes of cancer cells within a tumor. Immunohistochemical staining of tumors for EpCam (epithelial marker) and vimentin (mesenchymal marker) showed that the fraction of epithelial, EpCam-positive, cells was more than 90% in tumor xenografts and on average 76±10 % in patients’ tumors (Figure S3). However, the ratio of EpCam- to vimentin-positive cells in patients’ samples neither correlated with D-a1 nor with BI-a1, which means that the presence of cells with mesenchymal phenotype did not contribute to metabolic heterogeneity of tumors identified by NAD(P)H FLIM.

      (4) Clarify the design of the experiments: The results come from 50 - 200 cells in each sample (except 30 in the CaCo2 cell culture) that were counted from 5 - 10 images acquired from each sample. There were 21 independent human samples. How many independent samples were included in the cell culture experiments and the mouse tumor models? Why is there an order of magnitude fewer cells included in the CaCo2 group compared to the other groups (Figure 1)? From the image (Figure 1A - CaCo2), it seems to be a highly populated type of sample, yet only 30 cells were quantified. What prevents the inclusion of the same number of cells to be quantified in each group for a more systematic evaluation?

      We thank the Reviewer for this comment.

      Cell culture experiments included two independent replicates for each cell line, the data from which were then combined. In animal experiments measurements were made in three mice (numbered 1-3 in Figure 2C) for each tumor type. We have made calculations for additional >100 cells of CaCo2 cell line. In the revised version the number of Caco2 cells is 146.

      The text of the Manuscript was revised accordingly.

      (5) Regarding references: Some claims throughout the text would benefit from an additional reference. For example: line 70 "Metabolic heterogeneity [...] is believed to have prognostic value"; line 121 " [...] the uniformity of cell metabolism in a culture, which is consistent with the general view on standard cell lines [...]". The clinical translational aspect (i.e., paragraph in line 255) warrants the inclusion of the efforts already done with FLIM imaging in the clinical setting both in vivo and ex vivo with point-spectroscopy and macroscopy imaging (e.g., Jo Lab, Marcu Lab, French Lab, and earlier work by Mycek and Richards-Kortum in colorectal cancer to name a few).

      Additional references were added.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the Introduction, line 85, the authors mention that "Specifically, the unbound state of NAD(P)H has a short lifetime (~0.4 ns) and is associated with glycolysis, while the protein-bound state has a long lifetime (~1.7-3.0 ns) and is associated with OXPHOS". I do not think this claim is appropriate. One cannot simply say that the unbound state is associated with glycolysis, nor that the bound state is associated with OXPHOS; both unbound and bound state are associated with almost all the metabolic pathways. Instead, the expression of "glycolytic/ OXPHOS shift", as authors used in other sections of this manuscript, is a more appropriate one in this case.

      The text of the Introduction was revised.

      (2) What are the biological implications of the bimodality index (BI)? Please provide specific insights.

      Bimodal distribution indicates there are two separate and independent peaks in the population data. In the metabolic FLIM data, this indicates that there are two sub-populations of cells with different metabolic phenotypes. Previously, we have observed bimodal distribution in the population of chemotherapy treated cancer cells, where one sub-population was responsive (shifted metabolism) and the second - non-responsive (unchanged metabolism) [Shirshin et al., PNAS, 2022]. In the naive tumor, a number of factors have an impact on cellular metabolism, including genetics features and microenvironment, so it is difficult to determine which ones resulted in bimodality. Our data on correlation of bimodality (BI) with clinical characteristics of the tumors show that there are no associations between them. What really matters is the width of the parameter spread in the population. The early-stage tumors (T1, T2) were metabolically more heterogeneous than the late-stage ones (T3, T4). A degree of heterogeneity was also associated with differentiation state, a stage-independent prognostic factor in colorectal cancer where the lower grade correlates with better the prognosis. The early-stage tumors (T1, T2) and high-grade (G3) tumors had significantly higher dispersion of NAD(P)H-a1, compared with the late-stage (T3, T4) and low-grade ones (G1, G2). From the point of view of biological significance of heterogeneity, this means that in stressful and unfavorable conditions, to which the tumor cells are exposed, the spread of the parameter distribution in the population rather than the presence of several distinct clusters (modes) matters for adaptation and survival. The high diversity of cellular metabolic phenotypes provided the survival advantage, and so was observed in more aggressive (undifferentiated or poorly differentiated) and the least advanced tumors.

      The discussion has been expanded on this account.

      (3) Have you run statistics in Figure 1B? If yes, do you find any significance? The same question also applies to Figures 2C and 3C.

      We performed statistical analysis to compare different cell lines in in vitro and in vivo models, the results obtained are presented in Table S4.

      (4) Line 119, why is the BI threshold set at 1.1?

      When setting the BI threshold at 1.1, we relied on the work by Wang et al, Cancer Informatics, 2009. The authors recommended the 1.1 cutoff as more reliable to select bimodally expressed genes. Further, we validated this BI threshold to identify chemotherapy responsive and non-responsive sub-populations of cancer cells (Shirshin et al. PNAS, 2022)

      (5) Line 123, what does the high BI of mean lifetime stand for? Please provide biological implications and insights.

      The sentence was removed because inclusion of additional CaCo2 cells (n=146) for quantification NAD(P)H FLIM data showed no bimodality in this cell culture.

      (6) In the legend for Figure 2C, the authors mention that "the bimodality index (BI-a1) is shown above each box"; however, I do not see such values. It is also true for Figure 3C.

      The legends for Fig. 2 and 3 were corrected.

      (7) In Figure 2, t1-t3 were not explained and mentioned in the main text. What do they mean? Do they mean different time points or different tumors?

      t1-t3 means different tumors in a group. Changes have been made to the figure - individual tumors are indicated by numbers.

      (8) In Figure 3, what do p13, p15 and p16 mean? It is not clearly explained. If they just represent patients numbered 13, 15, and 16, then why are these patients chosen as representatives? Do they represent different stages or are they just chosen randomly?

      Figure 3 was revised. Representative images were changed and a short description for each representative sample was included. In the revised version, representatives have been selected to show different stages and grades.

      (9) In Figure 3, instead of showing the results for each patient, I would suggest that authors show representative results from tumors at different stages; or, at least, clearly indicate the specific information for each patient. I do not think that providing the patient number only without any patient-specific information is helpful.

      Figure 3 was revised.

      (10) The sample number (21 patients) is very limited. I wonder how the limited patient number could lead to reliable diagnosis and prognosis.

      Additional eight samples were added. The text, figures and tables were revised accordingly.

      (11) In Discussion, it would be helpful to compare the BI index used in this study with the previously developed OMI-index (Line 275).

      We believe that BI index and OMI index describe different things and, therefore, it is hard to compare them. While BI index is used to describe the degree of the metabolic heterogeneity, OMI index is an integral parameter that includes redox ratio, mean fluorescence lifetimes of NAD(P)H and FAD, and rather indicates the metabolic state of a cell. In this sense it is more relevant to compare it with conventional redox ratio or Fluorescence Lifetime Redox Ratio (FLIRR) (H. Wallrabe et al., Segmented cell analyses to measure redox states of autofluorescent NAD(P)H, FAD & Trp in cancer cells by FLIM, Sci. Rep. 2018; 8: 79). The assessment of the heterogeneity of the FLIM parameters has been previously reported using the weighted heterogeneity (wH) index (Amy T. Shah et al, In Vivo Autofluorescence Imaging of Tumor Heterogeneity in Response to Treatment, Neoplasia 17, pp. 862–870 (2015). To the best of our knowledge, this is the only metric to quantify metabolic heterogeneity on the basis of FLIM data for today. A comparison of BI with the wH-index showed that the value of wH-index provides results similar to BI in the heterogeneity evaluation as demonstrated in our earlier paper (E.A. Shirshin et al, Label-free sensing of cells with fluorescence lifetime imaging: The quest for metabolic heterogeneity, PNAS 119 (9) e2118241119 (2022).  Yet, the BI provides dimensionless estimation on the inherent heterogeneity of a sample, and therefore it can be used to compare heterogeneity assessed by different decay parameters and FLIM data analysis methods. The limitation of using the OMI index for FLIM data analysis is the low intensity of the FAD signal, which was the case in our experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      We would like to see the major conclusions constrained to better fit the data presented in the manuscript. Speed is only a single performance metric of a very complicated, very diverse system of locomotion.

      If the authors would like to maintain the broader conclusions, the study should be repeated with a number of different performance metrics to shore up the manuscript's results. Particularly with efficiency, speed is not a reliable measure of efficiency to begin with, so this needs to be explored in a more targeted and appropriate manner.

      We agree with Reviewer 1 that we should be more precise about the fitness metrics used and more constrained about the conclusions. Considering the points raised in each paragraph, we’ve modified the text as follows:

      - [line 17] “... to test the necessity of both traits for sustained and effective displacement on the ground.”

      - [starting on line 105] “We generate the robot’s sample using an artificial evolutionary process that selects for better locomotion ability - defined as higher average speed as it is a proxy for organisms with sustained and effective displacement.”

      - [starting on line 287] “We also found that different gravitational environments require different shape structures to optimize locomotion average speed.”

      - [starting on line 311] “This consistency is evidence that a small number of sparsely connected modules is a morphological computation principle for an organism’s optimized average speed.”

      - [starting on line 348] “Beyond that, extending the tests for other important aspects of locomotion behavior - as noise on the ground, energetic costs, and maneuverability - by using other locomotion metrics - as energy efficiency, stability margin, and dissipated power (Paez and Melo, 2014; Aoi et al., 2016 ) - would also be relevant to evaluate the principle’s robustness.”

      - [starting on line 524] “As the robots with the highest average speed are the ones that succeed in maximizing displacement and having robust dynamics (they will not tumble with time), we defined $\bar s$ as the fitness value using it as a proxy of successful directed locomotion. Selecting for bodies that maximize speed is a common locomotion bias in natural selection, as both predators and prey and thus fecundity and mortality depend on it (Alexander, 2006). Other measures - such as energy efficiency - can capture distinct important aspects of the locomotion complexity (Paez and Melo, 2014) and would be worthy of investigating in future work.”

      Paper Premise/Mission Statement: As defined in the abstract and also called out in the text starting on line 59 is "investigate whether symmetry and modularity are features of an organism's shape need [authors italics] to have for better-directed locomotion..."

      If we understood correctly the reviewer is asking for more precision in the statement. We modified the respective sentence in the following way:

      - [line 62] “... need to have for optimizing average speed on the ground,”

      Reviewer #2 (Recommendations For The Authors):

      i) a lot of details that are in the captions should be moved in the main text;

      Thank you for this comment. We reviewed all the captions and text making modifications to ensure that all the information in the captions is also present in the main text. Below, we highlighted some of the changes:

      - [line 57] “Thus, locomotion on the ground is present in phylogenetically distant species (such as the maned wolf and frogfish in Figure 1A) and depends upon … “

      - [starting on line 64] “Figure 1B shows a schematic representation of symmetry and modularity on the maned wolf and frogfish bodies.”

      - [starting on line 277] “There is a negative correlation between the proportion of feet voxels and the robot’s locomotion transference capability when the robots go to an environment with higher gravity, i.e., water to mars (dark blue in Figure 5C), water to earth (light blue), and mars to earth (red) - with a Spearman correlation coefficients of r = -0.39, r = -0.43, and r = -0.32, respectively, all with p < 1e-08.”

      ii) hypotheses should be spelled out more clearly;

      We verified the experiments and certified that every experiment had a clear hypothesis statement in the original manuscript. Before each section defining the hypothesis and describing the experiment, we added the following statement:

      - [starting on line 119] “ With this sample, we tested the hypotheses about the relationships between locomotion performance and body modularity and symmetry (Figure 1I).”

      iii) performance metrics and other features should be better defined using mathematical terms if possible (for example, instability);

      Thank you for the comment. We added a definition for instability in the text:

      - [starting on line 218] “Nonetheless, locomotion requires a minimum instability - the dynamic possibility of translating the center of mass - in the direction axis to generate the necessary forward displacement (Bruijn et al., 2013; Nagarkar et al., 2021).”

      Despite the different definitions of instability in literature (Bruijn et al., 2013, Paez and Melo, 2014; Aoi et al., 2016, Nagarkar et al., 2021), we didn’t find one mathematical definition that fits perfectly in our context.

      Following the reviewer's comment, when necessary we expanded the definition for other features:

      - [starting on line 199] “... the distribution of body weight. As the robots do not have sensory feedback abilities, the weight balance is defined as the body’s movement due to gravity forces (consequences of the weight distribution and surface contact points) (Benda et al., 1994). We hypothesized that the robots with the best directed locomotion ability would tend to have a symmetric body shape. A robot with a low XY shape symmetry (XY shape symmetry < 0.5) has a higher chance of having a poor weight balance, increasing the chance of the body tipping over, thus leading it to a lousy locomotion performance (blue dotted line in Figure 3C). “

      iv)  more details regarding the simulations should be included;

      We thank the reviewer for this comment. If we understood correctly the Reviewer 2 is asking for more details regarding: “a) the adequacy of the spatial resolution, whereby I failed to see a compelling argument regarding the completeness of 64 voxels; b) the realism of the oscillatory patterns, whereby all the voxels are set to oscillate at the same, constant, frequency of 2Hz; and c) the accuracy of simulations in water where added mass effects seem to be neglected.”. We modified the text to better satisfy these concern:

      a) [starting on line 96] “We choose to first explore exhaustively the $4^3$ space dimension, as it is the minimal possible space that allows meaningful body plans. We also did control experiments within 6^3 and 8^3 to check for dimension size effects.”

      - [starting on line 432] “We did control experiments with robots within 6³ and 8³ dimensions to check for dimension size effects - and we found that the results found in 4³ remained valid. We choose to focus our analysis in the 4³ design space because we consider it the minimum coarse-grain to approach the biological question about the contingency of shape outcomes pressured for locomotion. Smaller spaces do not allow sufficient complexity in the body structures, and increasing spatial resolution reduces the extensiveness of the investigated search space.”

      b) [starting on line 451] “… we used a fixed oscillation frequency of 𝑓 = 2 Hz (Kriegman et al.,2020). A fixed frequency value reduces the number of degrees of freedom in the search for solutions, but in return, it narrows the direct connection between the simulated organisms and animals. Exploring different frequency values in future work would be important to investigate the impact of varied oscillatory frequencies in the shape solutions for directed locomotion.”

      c) The environment we call “water” is not an accurate modeling of aquatic habitats as we didn’t simulate essential forces such as draff effects. This choice is explained in text starting on line 110: “In the water-like environment the bodies have nullifying body weight but do not have drag effects. We did not add drag in our simulations because our aim is to study just the body weight influences in locomotion independently of other forces.”

      v) a full paragraph about limitations should be included in the discussions, focusing on both simulation aspects (for example, the use of simple spring elements in the voxels) and theoretical assumptions (for example, addressing the potential role of non-locomotion-related aspects).

      We thank the reviewer for the comment. We edited some paragraphs of the discussion section to make more explicit some limitations of our work:

      [starting on line 398] “We expect that including other important aspects of an animal's body as a developmental process and sensory functions could influence the shape's outcomes with other layers of principles. Although we based our simulations on an already successful transference of \textit{in silico} behavior to organisms made of biological tissue

      \citep{kriegman_scalable_2020}, there is an intrinsic gap between spring-mass robots modeling and animal’s bodies that is worthy of exploring to ensure the generality of our results. Other methods, such as the inclusion of rigid body elements in the simulation (possible in Voxelyze), the use of finite element modeling (FEM) (Coevoet et al., 2019), and the construction of physical robots (Aguilar et al., 2016), are important complements to this work. Beyond that, principles on other scales as in the genotypes (Johnston et al., 2022) and in other behavioral phenotypes (Gomez-Marin et al., 2016) could also be investigated.”

      To address the potential role of non-locomotion-related aspects, we revised the section

      “Discussion - Contingency of evolutionary outcomes” where we discussed other functional and biological roles:

      [starting on line 354 ] “Here we investigate how a specific functional cause - optimization of average speed during directed locomotion on the ground - externally defines the phenotypic space of shape possibilities.”

      [starting on line 359] “For simplification purposes, we choose to not explicitly control other important factors of locomotion (i.e., energy consumption, maneuverability) that nonlinearly interact during locomotion. In future studies, it would be important to conduct similar studies on a wider range of factors to study the shape and dynamic principles in different conditions.“

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Summary:

      Mutational analysis of diffuse midline glioma (DMG) found that ACVR1 mutations, which up-regulate the BMP signaling pathway are found in most H3.1K27M, but not H3.3K27M DMG cases. In this manuscript, Huchede et al attempted to determine whether the BMP signaling pathway has any role in H3.3K27M DMG tumors. They found that the BMP signaling is activated to a similar level in H3.3K27M DMG cells with wild-type ACVR1 compared to ACVR1 DMG cells, likely due to the expression of BMP7 or BMP2. They went on to test whether cells treated with BMP7 or BMP2 treatments affected the gene expression and cell fitness of tumor cells with H3.3K27M mutation. They concluded that BMP2/7 synergizes with H3.3K27M to induce a transcriptomic rewiring associated with a quiescent but invasive cell state. The major issue for this conclusion is that the authors did not use the right models/controls to obtain results to support this conclusion as detailed below. Therefore, in order to strengthen the conclusion, the authors need to address the major concerns below.

      Strength:

      This paper addresses an important question in the DMG field.

      Major concerns/weakness:

      (1) All the results in Fig. 2 utilized two glioma lines SF188 and Res259. The authors should repeat all these experiments in a couple of H3.3K27M DMG lines by deleting the H3.3K27M mutation first.

      We thank the referee for his/her comments that have helped us to strengthen our conclusions. Although we were rather interested in studying how the BMP pathway can participate in installing a particular cell state at the time of expression of the K27M mutation, we have now included the characterization of the native H3.3K27M BT245 and SU-DIPGXIII cell lines, and their counterparts in which the mutation was reverted by CRISPRCas9 (Harutyunyan et al., 2019). As shown in Figure 3-figure supplement D, the growth arrest induced by BMP2 seems indeed to be specific of the K27M epigenetic context, which could also be required to settle a positive regulation loop to activate the BMP pathway, as mentioned in the Discussion.

      (2) Fig. 3. The experiments of BMP2 treatment should be repeated in other H3.3K27M DMG lines using H3.1K27M ACVR1 mutant tumor lines as controls.

      The use of mutant ACVR1 lines is interesting, but their control status seems questionable, as the addition of BMPs could have a cumulative effect on the effect of the mutation, notably by activating other receptors in the pathway. But we have now included 3 different cell lines (HSJD-DIPG-014, BT245 and SU-DIPGXIII), and observed similar impact of BMP2 with growth arrest as a readout (Figure 3-figure supplement C-D)

      Minor concerns

      Fig.2A. BMP2 expression increased in H3.3K27M SF188 cells. Therefore, the statement "whereas BMP2 and BMP4 expressions are not significantly modified (Figure 2A and Figure 2-figure supplement A-B)" is not accurate.

      The referee is absolutely right, and we have corrected this statement.

      Reviewer #2 (Public Review):

      The manuscript by Huchede et al investigates the BMP pathway in H3K27M-mutant gliomas carrying or not activating mutations in ALK2 (ACVR1). Their results in cell lines and in datasets acquired from the literature on patient tumors indicate that the BMP signaling pathway is activated at similar levels between ACVR1 wild-type and mutant tumors. The group further identifies BMP2 and BMP7 as possibly the main activators of the pathway in cells. They then show that BMP2 and 7 crosstalk with the H3 mutation and synergize to induce transcriptomic rewiring leading to an invasive cell state.

      The paper is well-written and easy to follow with a robust experimental plan and datasets supporting the claims. While previous work (acknowledged by the authors) indicated activation of BMP in H3K27M tumors, wild type for the ACVR1 mutation this paper is a nice addition and provides further mechanistic cues as to the importance of the BMP pathway and specific members in these deadly brain cancers. The effect of these BMPs in quiescence and invasion is of particular interest.

      We thank the referee for his/her supportive comments.

      A few suggestions to clarify the message are provided below 1- In thalamic diffuse midline gliomas, the BMP pathway should not be activated as it is in the pons. The authors should identify thalamic tumors in the datasets they explored and patients-derived cell lines from thalamic tumors available to investigate whether this pathway is active across all H3.3K27M mutants in the brain midline or specifically in tumors from the pons.

      The inter-patient variability observed in the level of activation of the BMP pathway may indeed be due, at least in part, to different tumor locations. However, we failed to find this information in the publicly available datasets that we used. We however included this element in the Discussion part.

      (2) There are ~20% H3.3K27M tumors that carry an ACVR1 mutation and similar numbers of H3.1K27M that are wild type for this gene. Can the authors identify these outliers in their datasets and assess the activation of BMP2 and 7 or other BMP pathway members in this context?

      We have now included the outliers present in our datasets in the legends of Figure 1B and Figure 1-figure supplement B and F. From the few samples available to document these outliers in the cohorts that we used, we have not observed major differences regarding the expression levels of BMP2/7 or BMP pathway members and have discussed the fact that it may result from the establishment in all cases of a feedback loop of activation.

      In all this is an interesting paper that provides meaningful data to pursue clinical targeting of the BMP pathway, which would be a nice addition to the field.

      We thank the reviewer for his/her supportive comments.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalescent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes. Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. (See also major comment #1 below about the interpretation of these plots.) A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

      Major comments:

      - For all of the simulated demographic inference results, only plots are presented. This allows for qualitative but not quantitative comparisons to be made across different methods. It is not easy to tell which result is actually better. For example, in Supp. Fig. 5, eSMC2 seems slightly better in the ancient past, and times the trough more effectively, while SMCm seems a bit better in the very recent past. For a more rigorous approach, it would be useful to have accompanying tables that measure e.g. mean-squared error (along with confidence intervals) for each of the different scenarios, similar to what is already done in Tables 1 and 2 for estimating $r$.

      We believe this comment was addressed in the previous revision (Sup Table 6-10) by adding Root Mean Square Errors for the demographic estimates (and RMSE for recent versus past portions of the demography). 

      - 434: The discussion downplays the really odd result that inputting the true value of the mutation rate, in some cases, produces much worse estimates than when they are learned from data (SFig. 6)! I can't think of any reason why this should happen other than some sort of mathematical error or software bug. I strongly encourage the authors to pin down the cause of this puzzling behaviour. (Comment addressed in revision. Still, I find the explanation added at 449ff to be somewhat puzzling -- shouldn't the results of the regional HMM scan only improve if the true mutation rate is given?)

      We do understand that our results and explanation can appear counter-intuitive. As acknowledged by the reviewer, in the previous round of revision we have at length clarified this puzzling behaviour by the discrepancy in assessing methylation regions using the HMM method which then differs from the HMM for the SMC inference. We are happy to clarify further in response to the new question of reviewer 1:

      If the Reviewer #1 means the SNP mutations (e.g. A → T), knowing the true mutation rate does not help the HMM to recover the region level methylation status. 

      If the Reviewer #1 means the epimutations (whether it is the region, site or both), knowing the true epimutations rates could theoretically help the HMM to recover the region level methylation status. However, at present, our method does not leverage information from epimutation rates to infer the region level methylation status. As inferring the epimutations rates is one of the goals of this study in the SMC inference, and that region level methylation status is required to infer those rates, we suspect that using epimutations rates to infer the region level methylation status could be statistically inappropriate (generating some kind of circular estimations). Instead, our HMM uses only the proportion of methylated and unmethylated sites (estimated from the genome) to determine whether or not a region status is most-likely to be methylated or unmethylated. We now explicit this fact in the HMM for methylation region in the method section.

      We acknowledge that our HMM to infer region level methylation status could be improved, but this would be a complete project and study on its own (due to the underlying complexity of the finite site and the lack of a consensus model for epimutations at evolutionary time scale). We believe our HMM to have been the best compromise with what was known from methylation and our goals when the study was conducted, and future work is definitely worth conducting on the estimation of the methylation regions.

      - As noted at 580, all of the added power from integrating SMPs/DMRs should come from improved estimation of recent TMRCAs. So, another way to study how much improvement there is would be to look at the true vs. estimated/posterior TMRCAs. Although I agree that demographic inference is ultimately the most relevant task, comparing TMRCA inference would eliminate other sources of differences between the methods (different optimization schemes, algorithmic/numerical quirks, and so forth). This could be a useful addition, and may also give you more insight into why the augmented SMC methods do worse in some cases. (Comment addressed in revision via Supp. Table 7.).

      - A general remark on the derivations in Section 2 of the supplement: I checked these formulas as best I could. But a cleaner, less tedious way of calculating these probabilities would be to express the mutation processes as continuous time Markov chains. Then all that is needed is to specify the rate matrices; computing the emission probabilities needed for the SMC methods reduces to manipulating the results of some matrix exponentials. In fact, because the processes are noninteracting, the rate matrix decomposes into a Kronecker sum of the individual rate matrices for each process, which is very easy to code up. And this structure can be exploited when computing the matrix exponential, if speed is an issue.

      We believe this comment was acknowledged in the previous revision (line 649), and we thank the reviewer for this interesting insight.

      - Most (all?) of the SNP-only SMC methods allow for binning together consecutive observations to cut down on computation time. I did not see binning mentioned anywhere, did you consider it? If the method really processes every site, how long does it take to run?

      We believe this comment was addressed in the previous revision and was added to the manuscript in the methods Section (subsection :  SMC optimization function).

      - 486: The assumed site and region (de)methylation rates listed here are several OOM different from what your method estimated (Supp. Tables 5-6). Yet, on simulated data your method is usually correct to within an order of magnitude (Supp. Table 4). How are we to interpret this much larger difference between the published estimates and yours? If the published estimates are not reliable, doesn't that call into question your interpretation of the blue line in Fig. 7 at 533? (Comment addressed in revision.)

      Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

      We thank again the reviewer #2 for his positive comments.  

      Reviewer #3 (Public Review):

      I very much like this approach and the idea of incorporating hypervariable markers. The method is intriguing, and the ability to e.g. estimate recombination rates, the size of DMRs, etc. is a really nice plus. I am not able to comment on the details of the statistical inference, but from what I can evaluate it seems reasonable and in principle the inclusion of highly mutable sties is a nice advance. This is an exciting new avenue for thinking about inference from genomic data. I remain a bit concerned about how well this will work in systems where much less is understood about methylation,

      The authors include some good caveats about applying this approach to other systems, but I think it would be helpful to empiricists outside of thaliana or perhaps mammalian systems to be given some indication of what to watch out for. In maize, for example, there is a nonbimodal distribution of CG methlyation (35% of sites are greater than 10% and less than 90%) but this may well be due to mapping issues. The authors solve many of the issues I had concerns with by using gene body methylation, but this is only briefly mentioned on line 659. I'm assuming the authors' hope is that this method will be widely used, and I think it worth providing some guidance to workers who might do so but who are not as familiar with these kind of data.

      We thank the reviewer #3 for his positive comments. And we agree with Reviewer #3 concerning the application to data and that our approach needs to be carefully thought before applied. Our results clearly show that methylation processes are not well enough understood to apply our approach as we initially (maybe naively) designed it. Further investigations need to be conducted and appropriate theoretical models need to be developed before reliable results can be obtained. And we hope that our discussion points this out. However, our approach, the theoretical models and the additional tools contained in this study can be used to help researchers in their investigations to whether or not use different genomic markers to build a common (potentially more reliable) ancestral history. We enhanced the discussion in this second revision by clarifying also the use of the methylation from genic regions to avoid  confusion (lines 700-731).

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      In added Supp. Table 7, I don't think these are in log10 units as stated in the caption.

      Well Spotted! Indeed, the RMSE is not in log10 scale, we corrected the caption. We also added that the TMRCA used for MRSE calculations is in generations units to avoid potential confusion.  

      Reviewer #3 (Recommendations for The Authors):

      I very much appreciate the authors' attention to previous questions. I would ask that a bit more is spent in the discussion on concerns/approaches empiricists should keep in mind -- I am wary of this being uncritically applied to data from non-model species. It was not clear to me, for example (only mentioned on line 659 in the discussion) that the thaliana data is only using gene-body methylation. This poses potential issues with background selection that the authors acknowledge appropriately, but also assuages many of my concerns about using genome-wide data. I think text with recommendations for data/filtering/etc or at least cautions of assumptions empiricists should be aware of would help.

      We apologize for the confusion at line 659. As written in the other section of the manuscript we meant CG sites in genic regions (and not only gene body methylated regions).

      Due to the manuscript’s structure, the data from Arabidopsis thaliana is only described at the very end of the manuscript (line 900+). However, a brief description could also be found line 291-296. We however added a sentence in the introduction (line 128) for clarity. 

      We however agree with the comment made by reviewer #3 concerning the application to data. We pointed in the discussion the risk of applying our approach on ill-understood (or illprepared) data and stressed the current need of studies on the epimutations processes at evolutionary time scale ( i.e. at Ne time scale) (line 700-703).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      Clostridium thermocellum serves as a model for consolidated bioprocess (CBP) in lignocellulosic ethanol production, but yet faces limitations in solid contents and ethanol titers achieved by engineered strains thus far. The primary ethanol production pathway involves the enzyme aldehydealcohol dehydrogenase (AdhE), which forms long oligomeric structures known as spirosomes, previously characterized via the 3.5 Å resolution E. coli AdhE structure using single-particle cryoEM. The present study describes the cryo-EM structure of the C. thermocellum ortholog, sharing 62% sequence identity with E. coli AdhE, resolved at 3.28 Å resolution. Detailed comparative structural analysis, including the Vibrio cholerae AdhE structure, was conducted. Integrating cryoEM data with molecular dynamics simulations indicated that the aldehyde intermediate resides longer in the channel of the extended form, supporting the hypothesis that the extended spirosome represents the active form of AdhE. 

      Strengths: 

      The study conducts a comprehensive structural comparative analysis of oligomerization interfaces and the acetaldehyde channel across compact and extended conformations. Structural and computational results suggest the extended spirosome as the most likely active state of AdhE. 

      Weaknesses: 

      The overall resolution of the C. thermocellum structure is similar to the E. coli ortholog, which shares 62% sequence identity, and the oligomerization interfaces and the acetaldehyde channel were previously described. 

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Ziegler et al, entitled 'Structural characterization and dynamics of AdhE ultrastructure from Clostridium thermocellum: A containment strategy for toxic intermediates?" presents the atomic resolution cryo-EM structure of C. thermocellum AdhE showing that it show dominantly an extended form while E. coli AdhE shows dominantly a compact form. With comparative analysis of their C. thermocellum structure and the previous E. coli AdhE structure, they tried to reveal the mechanism by which C. thermocellum and E. coli show diXerent dominant conformations. In addition, they also analyzed the substrate channel by comparative and computational approaches. Lastly, their computational analysis using CryoDRGN reveals conformational heterogeneity in the sample. Although this manuscript suggests a potential mechanism of the diXerent features of AdhEs, this manuscript is very descriptive and does not provide suXicient data to support the authors' conclusions, which may be due to the lack of experimental data to support their findings from the computational analysis. 

      Strengths: 

      This manuscript provides the first C. thermocellum (Ct) AdhE structure and comparatively analyzed this structure with E. coli AdhE. 

      Weaknesses: 

      Their main conclusions obtained mostly by computational and comparative analysis are not supported by experimental data. 

      Reviewer #3 (Public Review): 

      This study describes the first structure of Gram-positive bacterial AdhE spirosomes that are in a native extended conformation. All the previous structures of AdhE spirosomes obtained come from Gram-negative bacterial species with native compact spirosomes (E. coli, V. cholerae). In E. coli, AdhE spirosomes can be found in two diXerent conformational states, compact and extended, depending on the substrates and cofactors they are bound to. 

      The high-resolution cryoEM structure of the extended C. thermocellum AdhE spirosomes produced in E. coli in an apo state (without any substrate or cofactors) is compared to the E. coli extended and compact AdhE spirosomes structures previously published. The authors have modeled (in Swiss-Model) the structure of compact C. thermocellum AdhE spirosomes, using E. coli compact AdhE spirosome conformation as a template, and performed molecular dynamics simulations. They have identified a channel in which the toxic reaction intermediate aldehyde could transit from the aldehyde dehydrogenase active site to the alcohol dehydrogenase active site, in an analogous manner to E. coli spirosomes. These findings are in line with the hypothesis that the extended spirosomes could correspond to the active form of the enzyme. 

      In this work, the authors speculate that the C. thermocellum AdhE spirosomes could switch from the native extended conformation to a compact conformation, in a way that is inverse of E. coli spirosomes. Although attractive, this hypothesis is not supported by the literature. Amazingly, in some Gram-positive bacterial species (S. pneumoniae, S. sanguinis or C. di8icile...), AdhE spirosomes are natively extended and have never been observed in a compact conformation. On the opposite, E. coli (and other Gram-negative bacteria) native AdhE spirosomes are compact and are able to switch to an extended conformation in the presence of the cofactors (NAD+, coA, and iron). The data presented as they are now are not convincing to confirm the existence of C. thermocellum AdhE spirosomes in a compact conformation. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      (1) The claim of achieving the highest resolution AdhE structure lacks strong support since the E. coli structure was solved at 3.5A, whereas the C. thermocellum was solved at 3.28A. Conducting a local resolution analysis could provide insights into distinct structural interpretations, enhancing the strength of the claim. 

      We have modified the sentence claiming this as the highest resolution AdhE structure to say, “In this study, we presented and analyzed a high-resolution structure of the AdhE spirosome from C. thermocellum.” We have included the local resolution map in Figure 2C – all structural analysis was performed in regions from the center of the molecule, where the highest resolution information was determined.

      (2) The comparative structural analysis of the oligomerization interface is thorough, yet it could benefit from greater conciseness. Focusing on highlighting major findings would streamline the presentation and enhance clarity. 

      We altered a few places in the comparative structural analysis in response to other reviewers. We also divided the main structure section into two subsections (spirosome interfaces and AdhE active sites) to enhance clarity.

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors should change the tile containing "?". Does it mean that the conclusions that the authors made are still in question? 

      We have removed the question mark to indicate that our results point to a channeling mechanism.

      (2) Figure 1B: Clarify Ct Fwd. Is this adding NADH, and Ct Rev adding NAD+? 

      This information is described in the text in lines 98-100. It is also at the bottom of figure 1B.

      (3) Line 131: Please revise accordingly for clarity: "The extended dimer interfaces" è "The extended E.coli dimer interface". 

      This has been edited for clarity. We have added the following sentence resulting to indicate which interfaces that are being discussed: “Both the E. coli and C. thermocellum extended dimer interfaces bury ~5000 Å2. While the compact C. thermocellum compact dimer interface buries a similar surface area of ~4800 Å2, the E. coli dimer interface buries ~3800 Å2.”

      (4) Line 133-136: Why that does not seem to be the case? These sentences are not clear what the authors exactly mean. 

      We altered the text to say, “One would expect the compact structure in E. coli to have a larger buried surface area due to it being the predominant form when it is examined without additives, but that is not the case; further corroborating that factors other than buried surface area must impact the apo state of the spirosome.” We hope this clarifies our intent.

      (5) Line 138-145: The authors should provide a logic for how the diXerent distribution of the charged residues would change the form of AdhE. It may just be a diXerent distribution nothing to do with the conformational change. 

      After further analysis of the interface amino acid distribution, we agree that the distribution may have nothing to do with the conformational change. We have changed this section to end with the sentence “Analysis of the residues buried in these interfaces reveals that while many of the residues are identical in the C. thermocellum and E. coli extended structures, there are some diXerences in amino acid type distribution, although nothing that directly indicates control of conformer state (Supplemental Figure 3).” 

      (6) Line 169: Kim et al. è Cho et al.

      We have corrected this error.

      (7) Line 122-235: The whole section is just describing the diXerence between Ct and Ec AdhE suggesting that this diXerence may contribute to the conformational diXerence without any evidence. The author cannot say that the diXerences in the interface, active sites cofactor pockets, etc explain why two AdhE (Ct, Ec) have diXerent domain conformers unless they provide experimental data. 

      We did not conclude that any diXerences we observed structurally were responsible for the conformation change. The purpose of this section was solely to compare the structures to determine if we could find a structural basis for the diXerence between E. coli and C. thermocellum conformation – we stated a few times throughout the section and in the discussion that there were no immediate structural reasons for this diXerence in shape. We have added a few sentences in the discussion to address whether Gram-positive vs. Gram-negative is influencing the shape, addressed in reviewer #3 comment #4. 

      (8) Line 237: The whole section "Identification..." analyzed the substrate channel by computational analysis. The author should provide experimental evidence that these residues identified are critical for channeling by generating mutants and measuring their activity. 

      We agree that mutagenesis is the next logical step for these results, however it is outside the scope of work of this paper as this study will not be that straightforward. We have included a sentence in the discussion to indicate our plans for further investigation to the channel that says, “Future mutagenesis studies will be needed to confirm whether the spirosome exists to control the reaction flux in high-reactant conditions.”

      Reviewer #3 (Recommendations For The Authors): 

      (1) The capacity of C. thermocellum AdhE spirosomes to switch from a natively extended conformation to a compact conformation is not demonstrated in this manuscript, as it is now. Because this would be the first time that Gram-positive bacterial AdhE spirosomes are observed in a compact conformation, the authors should provide a clear demonstration of their existence by presenting reliable and good images of C. thermocellum compact spirosomes. 

      We have modified Figure 1A to zoom in on one compact and extended spirosome that we have identified from each C. thermocellum sample. We have included triangles of the same size and shape to indicate the proximity of a turn of a helix, showing that the identified compact spirosomes have a tighter conformation than extended spirosomes.

      (2) The authors should show at least an image of the compact C. thermocellum spirosomes, that they claim to observe in the presence of NADH or in the forward reaction conditions mentioned in Figure 1. The authors have added diXerent reactants to the extended C. thermocellum spirosomes and visualized their conformation by negative stain. An image of each condition tested would be valuable and would nicely complete the distribution of compact versus extended spirosomes presented in Figure 1. 

      We have created a new supplemental figure with spirosomes circled for all of the experimental conditions for C. thermocellum (Supplemental figure 1). We have added a reference to supplemental figure 1 in the text to direct the reader to these images.

      (3) The cryoEM classes presented in Figure 8 are not convincing and could correspond to dimers or rosettes of AdhE or to E. coli endogenous AdhE. CryoEM classes showing longer compact C. thermocellum spirosomes should be shown. The percentage of these compact spirosomes visualized in the micrographs should be added and discussed in the text as it would increase confidence in these findings and confirm that C. thermocellum compact spirosomes exist. Heterologous production of C. thermocellum AdhE in E. coli depleted for its endogenous AdhE would be required to definitively prove that these are compact C. thermocellum AdhE spirosomes in the cryoEM. 

      We included the pictures of the theoretical compact spirosomes, as generated from the 8-mer of E. coli AdhE (6AHC) to address the possibility of rosettes. We have now indicated in the text that there were 6.7% of the particles in the compact conformation, which is less than seen by negative stain. We further mentioned that the compact spirosome is less compact than that seen in E. coli. We added a sentence to the discussion about the possibility of contaminating E. coli spirosomes (though this is very unlikely ) in our compact spirosome analysis: “While these compact spirosomes could result from expression in E. coli, though this is very unlikely, we also identified compact spirosomes in a native C. thermocellum lysate, which would not have similar contamination issues.”

      (4) The authors should include and discuss in the text previous findings (among which Laurenceau et al., 2015...) describing the diXerences between Gram-positive and Gram-negative spirosomes. AdhE spirosomes are natively extended in most Gram-positive bacterial species (S. pneumoniae, S. sanguinis or C. diXicile...), and have never been observed in a compact conformation. On the opposite, E. coli (and other Gram-negative bacteria) native AdhE spirosomes are compact and are able to switch to an extended conformation in the presence of the cofactors (NAD+, coA, and iron). 

      We have added the following sentences to the discussion to address this comment: “This could potentially be due to the diXerences between Gram-positive and Gram-negative bacteria. In previous studies, compact spirosomes have only been isolated from Gram-negatives while solely extended spirosomes have been isolated from Gram-positives. Furthermore, while the compact spirosomes can transition to extended in the presence of cofactors, the reverse has not been previously observed with an extended spirosome.”

      (5) The authors have spotted some diXerences between the E. coli and C. thermocellum structures, that they believe could explain the intrinsic capacity of these spirosomes to be natively extended or compact. It would be interesting to confirm this hypothesis by measuring C. thermocellum extended AdhE spirosome activity and comparing it to E. coli extended spirosomes. The impact of mutations in the regions proposed by the authors to be important in the capacity of C. thermocellum AdhE to be extended (especially the GxGxxG motif and the D494 position) would be appreciated to confirm this hypothesis. 

      We agree that this would be an interesting avenue of research although it is currently outside the scope of this paper. We are looking into experiments that we can perform where we can track both activity and conformation but have not found an ideal experiment at this time.

      (6) Many statements and result interpretations are overstated in several parts of the manuscript and would need to be rewritten to balance the absence of clear evidence of C. thermocellum compact spirosomes. 

      We have shown that we have identified compact spirosomes, addressed in multiple comments above. We have adjusted the language of the paper to indicate more uncertainty that will be followed up in future mutagenesis experiments. However, these mutations are not that simple to identify and this research would require a fairly large study that is better suited for a follow up manuscript.

      (7) The Figure 7 legend would need to be corrected.

      We are unsure as to what needs to be corrected in the figure 7 legend based on this comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      (1) In my assessment, the data sufficiently demonstrates that a modified version of Pertuzamab can bind both the wild-type and S310 mutant forms of ERBB2.

      (2) The engineering strategy employed is rational and effectively combines computational and experimental techniques.

      (3) Given the clinical activity of HER2-targeting ADCs, antibodies unaffected by ERBB2 mutations would be desired.

      Weaknesses:

      (1) There is no data showing that the engineered antibody is equally specific as Pertuzamab i.e. that it does not bind to other (non-ERBB2) proteins.

      Showing the specificity of the engineered antibodies is indeed important. We did not address it in the current ms, but it can be tested in the future.

      (2) There is no data showing that the engineered antibody has the desired pharmacokinetics/pharmacodynamics properties or efficacy in vivo.

      In this ms we did not conduct in-vivo experiments. When moving forward, pharmacokinetics/pharmacodynamics properties and efficacy will be tested as well.

      (3) Computational approaches are only used to design a phage-screen library, but not used to prioritize mutations that are likely to improve binding (e.g. based on predicted impact on the stability of the interaction). A demonstration of how computational pre-screening or lead optimization can improve the time-intensive process would be a welcome advance.

      Thank you for this important comment. In the present ms we indeed used a computational approach for prioritizing residues to be mutated, but we did not prioritize the mutations that are likely to improve binding. In the initial library design, we did prioritize the mutations. However, due to experimental approach limitations with codon’s selection for the library, we had decided to allow all possible residues in each position, knowing that the selection will remove non-binding variants.

      Context:

      The conflict of interest statement is inadequate. Most authors of the study (but not the first author) are employees of Biolojic, a company developing multi-specific antibodies, but the statements do not clarify whether the presented antibodies represent Biolojic IP, whether the company sponsored the research, and whether the company is further developing the specific antibodies presented.

      The Conflict-of-Interest statement will be revised as such: The Biolojic Design authors are employees of Biolojic Design and have stock options in Biolojic Design. The company did not sponsor the research, does not hold IP for the presented antibodies, and is not further developing the presented antibodies.

      Reviewer #2 (Public Review):

      Strengths:

      (1) Deep computational analyses of large datasets of clinical data provide useful information about HER2 mutations and their potential relevance to antibody therapy resistance.

      (2) There is valuable information analyzing the residues within or near the interface between the antigen HER2 and the Pertuzumab antibody (heavy chain). The experimental antibody library screening obtained 90+ clones from 3.86×1011 sequences for further functional validation.

      Weaknesses:

      (1) There is a lack of assessment for antibody variant functions in cancer cell phenotypes in vitro (proliferation, cell death, motility) or in vivo (tumor growth and animal survival). The only assay was the western blotting of phosphopho-HER3 in Figure 4. However, HER2 levels and phosphor-HER2 were not analyzed.

      We indeed did not assess the engineered antibodies function in cancer cells. While a complete signaling assessment obviously requires functional assessment as well, due to the complexity of this assay, papers in this field (for example [1-3]) measure the signaling activation following HER2-HER3 dimerization by measuring pHER3, and we relied on them in this ms.

      (2) There is a misleading impression from the title of computational engineering of a therapeutic antibody and the statement in the abstract "we designed a multi-specific version of Pertuzumab that retains original function while also bindings these HER2 variants" for a few reasons:

      a. The primary method used for variant antibody identification for HER2 mutant binding is rather traditional experimental screening based on yeast display instead of the computational design of a multi-specific version of Pertuzumab.

      b. There is insufficient or lack of computational power in the antibody design or prioritization in choosing variant residues for the library construction of 3.86×1011 sequences. It seems random combinations from 6 residues out of 4 groups with 20 amino acid options.

      c. The final version of the tri-binding variant is a combination of screened antibody clones instead of computation design from scratch.

      d. There is incomplete experimental evidence about the therapeutic values of newly obtained antibody clones.

      Thank you for this relevant comment. When addressing relevant residues to be mutated, the number of potential variants is enormous. The computational approach was aimed at identifying the most preferable residues, in which variation can improve binding and is not likely to harm important interactions. Although an initial smaller number of residues could be chosen, we decided to broaden our view and create a larger library, in the aim of combining the computational selection with an experimental selection. This indeed is not a computational design from scratch, but rather an intercourse between the computer and the lab, that yielded the presented results.

      (3) Figures can be improved with better labeling and organization. Some essential pieces of data such as Supplementary Figure 1B on HER2 mutations in S310 that abrogated its binding to Pertuzumab should be placed in the main figures.

      Thank you for this comment, the relevant figures were moved to the main text, and the labels were revised.

      (4) It is recommended to provide a clear rationale or flowchart overview into the main Figure 1. Figure 2A can be combined with Figure 1 to the list of targeted residues.

      Figures 1 and 2 were divided differently, and the rationale was moved to the main text.

      (5) The quality of Figures such as Figure 2B-C flow data needs to be improved.

      High-quality figures were submitted with the revised ms.

      Reviewer #1 (Recommendations for The Authors):

      Major:

      (1) It should be clarified whether the S310 somatic mutations represent resistance mutations to Pertuzamab (i.e. emerge post-therapy) or are general mutations that activate HER2. This is important because mutations that specifically "evade" the binding of an antibody may be substantially more difficult to overcome than mutations that only by chance occur in the antibody binding site. This concern should be addressed in the introduction and discussion as it changes the interpretation of the data.

      This is a very important note. To the best of our knowledge, these mutations were not identified as resistance mutations that emerged post-therapy. However, as mentioned in the introduction, these mutations form hydrophobic interactions that stabilize HER2 dimerization. Moreover, cells expressing these mutations show hyperphosphorylation of HER2 and an increase in the subsequent activation of signaling pathways. Thus, these mutations do not necessarily evade Pertuzumab binding, but benefit cancer growth. This point was clarified in the introduction of the revised text.

      (2) While the authors claim that S310 germline pathogenic variants exist, I could not find evidence that this is the case. The dbGAP ID does not provide any evidence (either in the form of a citation or prevalence). The variants do not exist in GnomAD. A recent article discussing pathogenic ERBB2 germline variants only mentions S310 as a somatic variant https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8268839/ and I could not find evidence for S310 being a germline variant in the references provided by the author (https://www.nature.com/articles/nbt.3391) - where it is only mentioned as a somatic mutation. I could not find evidence of a cancer predisposition syndrome associated with this variant.

      Thank you for highlighting this matter. We had assumed that the presence of the variant in dbSNP means it is also a germline mutations, what may not be correct. However, we did find some evidence of this mutation as germline in ClinVar, and this was edited in the revised ms. https://www.ncbi.nlm.nih.gov/clinvar/RCV001311879.7.

      (3) The authors should consider experiments that show that the modified Pertuzamab has the same mechanism of action as the original Pertuzamab in preventing dimerization of the ERBB2 homodimer and/or interactions with ERBB3. I cannot recommend a specific approach, but at present it is not clear whether the mechanism or just the effect (phosphorylation of ERBB3) is the same.

      As mentioned above, for the assessment of HER-HER3 binding and HER3 signaling, in this ms we relied on a previous works [1-3] that also measured the signaling activation following HER2-HER3 dimerization by measuring pHER3.

      (4) The authors should perform in vitro experiments to demonstrate that the engineered antibody has similar on-target specificity not only sensitivity. I don't know what the ideal experiments would be, but should probably probe native epitopes. Western blots, immunoprecipitation of cell lysates?

      As mentioned above, showing the specificity of the engineered antibodies is indeed important. We did not address it in the current ms, but it can be tested in future work.

      Minor:

      (1) The introduction should review better the literature on the computational/rational design of antibodies, especially multi-specific - and likely de-emphasize small molecules (and mutations associated with the resistance thereof) as the presented research does not inform the design of mutation-agnostic small molecules.

      Thank you for these comments, the introduction was revised accordingly.

      (2) The authors should better present the fact that the lack of binding of Pertuzamab to HER2 S310 was previously known, thus the whole strategy of searching COSMIC, and computationally predicting their binding impact was unnecessary. Rather it would be helpful to learn how many other COSMIC hotspots could have a similar effect on other clinical antibodies.

      The lack of binding was indeed previously known, as mentioned in the introduction. However, we did not start our analysis targeting HER2 specifically, but we rather found these mutations because they were located in the binding pocket, which enabled our strategy to compensate for these mutations with alteration of the original Pertuzumab. Regarding other potential hotspots, the numbers appeared in Supplementary Table 1, and were moved to the main text.

      Stylistic:

      (1) Avoid using the term "drug" for an antibody.

      The term was changed to “antibody therapeutics” in the revised text.

      (2) Avoid repetition in the introduction.

      Thank you, we revised the introduction with this comment in mind.

      Reviewer #2 (Recommendations For The Authors):

      The quality of Figure 2B-C flow data needs to be improved:

      a. The diagonal populations suggest inappropriate color compensation or indicate cells are derived from unhealthy populations.

      We believe there may be some confusion here. The figures you are referring to are figures of very diverse library. The selected clones show nice diagonals, as shown in Supplementary Figure 5.

      b. Additional round 3 and round 4 did not seem to improve the enrichment of targeted clones but rather had similar binding profiles to each of the three proteins over and over.

      Two sets of the fourth round of selection were done, each originated from a different sub-population in round 3: 1. Clones that bind the S310Y mutation 2. Clones that bind the S310F mutation. The aim of the R4 was to examine this binders against the second mutation and canonical HER2 in the search for multi-specificity. Additional clarification of this point will be added to the main text.

      c. Figure legends are vague with non-specific descriptions of cells and conditions, and unclear statements of "FACS results...".

      The legends were edited in the revised version.

      d. Text fonts are in low resolution.

      High-quality figures were submitted with the revised ms.

      (1) Diwanji, D., et al., Structures of the HER2-HER3-NRG1β complex reveal a dynamic dimer interface. Nature, 2021. 600(7888): p. 339-343.

      (2) Yamashita-Kashima, Y., et al., Mode of action of pertuzumab in combination with trastuzumab plus docetaxel therapy in a HER2-positive breast cancer xenograft model. Oncol Lett, 2017. 14(4): p. 4197-4205.

      (3) Kang, J.C., et al., Engineering multivalent antibodies to target heregulin-induced HER3 signaling in breast cancer cells. MAbs, 2014. 6(2): p. 340-53.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The development of effective computational methods for protein-ligand binding remains an outstanding challenge to the field of drug design. This impressive computational study combines a variety of structure prediction (AlphaFold2) and sampling (RAVE) tools to generate holo-like protein structures of three kinases (DDR1, Abl1, and Src kinases) for binding to type I and type II inhibitors. Of central importance to the work is the conformational state of the Asp-Phy-Gly "DFG motif" where the Asp points inward (DFG-in) in the active state and outward (DFG-out) in the inactive state. The kinases bind to type I or type II inhibitors when in the DFG-in or DFG-out states, respectively.

      It is noted that while AlphaFold2 can be effective in generating ligand-free apo protein structures, it is ineffective at generating holo-structures appropriate for ligand binding. Starting from the native apo structure, structural fluctuations are necessary to access holo-like structures appropriate for ligand binding. A variety of methods, including reduced multiple sequence alignment (rMSA), AF2-cluster, and AlphaFlow may be used to create decoy structures. However, those methods can be limited in the diversity of structures generated and lack a physics-based analysis of Boltzmann weight critical to their relative evaluation.

      To address this need, the authors combine AlphaFold2 with the Reweighted Autoencoded Variational Bayes for Enhanced Sampling (RAVE) method, to explore metastable states and create a Boltzmann ranking. With that variety of structures in hand, grid-based docking methods Glide and Induced-Fit Docking (IFD) were used to generate protein-ligand (kinase-inhibitor) complexes.

      The authors demonstrate that using AlphaFold2 alone, there is a failure to generate DFG-out structures needed for binding to type II inhibitors. By applying the AlphaFold2 with rMSA followed by RAVE (using short MD trajectories, SPIB-based collective variable analysis, and enhanced sampling using umbrella sampling), metastable DFG-out structures with Boltzmann weighting are generated enabling protein-ligand binding. Moreover, the authors found that the successful sampling of DFG-out states for one kinase (DDR1) could be used to model similar states for other proteins (Abl1 and Src kinase). The AF2RAVE approach is shown to result in a set of holo-like protein structures with a 50% rate of docking type II inhibitors.

      Overall, this is excellent work and a valuable contribution to the field that demonstrates the strengths and weaknesses of state-of-the-art computational methods for protein-ligand binding. The authors also suggest promising directions for future study, noting that potential enhancements in the workflow may result from the use of binding site prediction models and free energy perturbation calculations.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores the utility of AlphaFold2 (AF2) and the author's own AF2-RAVE method for drug discovery. As has been observed elsewhere, the predictive power of docking against AF2 structures is quite limited, particularly for proteins like kinases that have non-trivial conformational dynamics. However, using enhanced sampling methods like RAVE to explore beyond AF2 starting structures leads to a significant improvement.

      Strengths:

      This is a nice demonstration of the utility of the authors' previously published RAVE method.

      Weaknesses:

      My only concern is the authors' discussion of induced fit. I'm quite confident the structures discussed are present in the absence of ligand binding, consistent with conformational selection. It seems the author's own data also argues for an important role in conformational selection. It would be nice to acknowledge this instead of going along with the common practice in drug discovery of attributing any conformational changes to induced fit without thoughtful consideration of conformational selection.

      The reviewer is correct. We aim to highlight the significant role of conformational selection. To clarify this, we have expanded the discussion on conformational selection in the introduction.

      Reviewer #3 (Public Review):

      In this manuscript, the authors aim to enhance AlphaFold2 for protein conformation-selective drug discovery through the integration of AlphaFold2 and physics-based methods, focusing on improving the accuracy of predicting protein structures ensemble and small molecule binding of metastable protein conformations to facilitate targeted drug design.

      The major strength of the paper lies in the methodology, which includes the innovative integration of AlphaFold2 with all-atom enhanced sampling molecular dynamics and induced fit docking to produce protein ensembles with structural diversity. Moreover, the generated structures can be used as reliable crystal-like decoys to enrich metastable conformations of holo-like structures. The authors demonstrate the effectiveness of the proposed approach in producing metastable structures of three different protein kinases and perform docking with their type I and II inhibitors. The paper provides strong evidence supporting the potential impact of this technology in drug discovery. However, limitations may exist in the generalizability of the approach across other structures, especially complex structures such as protein-protein or DNA-protein complexes.

      Proteins undergo thermodynamic fluctuations and can occasionally reach metastable configurations. It can be assumed that other biomolecules, such as proteins and DNA, stabilize these metastable states when forming protein-protein or protein-DNA complexes. Since our method has the potential to identify these metastable states, it shows promise for designing drugs targeting proteins in allosteric configurations induced by other biomolecules.

      The authors largely achieved their aims by demonstrating that the AF2RAVE-Glide workflow can generate holo-like structure candidates with a 50% successful docking rate for known type II inhibitors. This work is likely to have a significant impact on the field by offering a more precise and efficient method for predicting protein structure ensemble, which is essential for designing targeted drugs. The utility of the integrated AF2RAVE-Glide approach may streamline the drug discovery process, potentially leading to the development of more effective and specific medications for various diseases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions

      (1) The computational protocol is found to be insufficient to generate precise values of the relative free energies between structures generated. The authors note in the Conclusion that an enhancement in the workflow might result from the addition of free energy calculations. Can the authors comment on the prospects for generating more accurate estimates of the free energy that might be used to qualitatively evaluate poses and the free energy landscape surrounding putative metastable states? What are the principal challenges and what might help overcome them? What would the most effective computational protocol be?

      More accurate estimates of the free energy can theoretically be achieved by increasing the number of umbrella sampling windows and extending the simulation length until the PMF converges. However, there is always a trade-off between PMF accuracy and computational costs, so we have chosen to stick with the current setup. Metadynamics is another method to obtain a more accurate free energy profile, which we have used in previous versions of AlphaFold2-RAVE, but for the specific systems we investigated, it had issues in achieving back and forth movement given the high entropic nature of the activation loop. Research in enhanced sampling methods and dimensionality reduction techniques for reaction coordinates is continually evolving and will play a critical role in alleviating this problem.

      (2) I was surprised that there was not more correlation of a funnel-like shape in Figures S16 and S18, showing a stronger correlation between low RMSD and better docking score. This is true for both the ponatinib and imatinib applications in DDR1 and Abl1. That also seems true for the trimmed results for Src kinase in Figure S19. I was also surprised that there are structures with very large RMSD but docking scores comparable to the best structures of the lowest RMSD. Might something be done to make the docking score a more effective discriminator?

      The docking algorithm and docking score are used to filter out highly improbable docking poses. False positives in predicted docking poses are a common issue across all docking methods as described for instance in:

      Fan, Jiyu, Ailing Fu, and Le Zhang. "Progress in molecular docking." Quantitative Biology 7 (2019): 83-89.

      Ferreira, R.S., Simeonov, A., Jadhav, A., Eidam, O., Mott, B.T., Keiser, M.J., McKerrow, J.H., Maloney, D.J., Irwin, J.J. and Shoichet, B.K., 2010. "Complementarity between a docking and a high-throughput screen in discovering new cruzain inhibitors." Journal of medicinal chemistry, 53(13), pp.4891-4905.

      Moreover, there is always a trade-off between docking accuracy and computational cost. While employing more accurate docking methods may decrease false positives, it can also be resource-intensive. In such scenarios, our approach to enriching holo-structures can be impactful by reducing the number of pocket structures in the input ensembles and significantly enhancing docking efficiency.

      (3) I think that it is fine to identify one structure as "IFD winner" but also feel that its significance is overstressed, especially given that it can be identified only in a retrospective analysis rather than through de novo prediction.

      We agree with the reviewer. We did not intend to emphasize the specific structure "IFD winner". Rather, we aimed to demonstrate that our method can enrich promising candidates for holo-structures. We verified this by showing that our holo-structure candidates performed well in retrospective docking using IFD, which we previously referred to as "IFD winner". We have now revised this term to "holo-model".

      Minor Points

      p. 3 "DymanicBind" should be "DynamicBind"

      p. 3 Change "We chosen" to "We have chosen" or "we chose."

      p. 3 In identifying the Schrödinger software Glide and IFD, I recommend removing the subjective modifier "industry-leading."

      Modifications done.

      Reviewer #2 (Recommendations For The Authors):

      In the view of this reviewer, the writing is 'choppy'.

      We have tried to improve the writing.

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figure 1, the workflow labels (i) to (iv) are not shown on the figures, making it difficult for readers to follow. Consider adding these labels to the figures.

      Modifications done.

      (2) Explain how Boltzmann ranks were calculated based on unbiased MD simulations to guide the enrichment of holo-like structures in metastable states.

      The Methods section is now updated for clarification.

      (3) The authors could clarify how the classical DFG-out decoys in the DDR1 rMSA AF2 ensemble are transferred to Abl1 kinase in the Methods section.

      The Methods section is now updated for clarification.

      (4) The authors can clarify the methodology section by providing more detailed explanations about how the unbiased MD simulations are performed, including which MD simulation software was used and whether energy minimization and equilibrium steps were needed as in conventional MD simulations, and other setup details.

      The Methods section is now updated for clarification.

      (5) The validation of the proposed approach in this work used three kinase proteins. The authors can enhance the discussion section by addressing other types of protein structure prediction that can use the proposed approach in drug discovery, beyond the three kinase proteins tested.

      The proposed approach is theoretically applicable to other types of proteins, such as GPCRs, where both conformational selection and the induced-fit effect are crucial. We have expanded the discussion on the generalization of our protocol in the Conclusion section.

      (6) The authors should add appropriate citations for the software and tools used in the manuscript. For example, a reference should be added for the Glide XP docking experiments that utilized the Maestro software. Double-check all related software citations.

      We have now updated the citations for docking experiments based on the instruction of the Maestro Glide User manual and IFD User manual.

      (7) The authors should consider offering a comprehensive list of software tools and databases utilized in the study to assist in replicating the experiments and further validating the results.

      We have now added a summary of tools used in the Methods section.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Responses to Reviewer #1:

      Reviewer #1: The study shows a new mechanism of NFkB-p65 regulation mediated by Vangl2-dependent autophagic targeting. Autophagic regulation of p65 has been reported earlier; this study brings an additional set of molecular players involved in this important regulatory event, which may have implications for chronic and acute inflammatory conditions.

      Comments on the revised version:

      The authors have addressed the earlier concerns and I am satisfied with the revised version. I have no additional comments to make.

      We appreciate the reviewer’s comments on our revised manuscript.

      Responses to Reviewer #2:

      Reviewer #2: Vangl2, a core planar cell polarity protein involved in Wnt/PCP signaling, cell proliferation, differentiation, homeostasis, and cell migration. Vangl2 malfunctioning has been linked to various human ailments, including autoimmune and neoplastic disorders. Interestingly, it was shown that Vangl2 interacts with the autophagy regulator p62, and autophagic degradation limits the activity of inflammatory mediators, such as p65/NF-κB. However, the possible role of Vangl2 in inflammation has not been investigated. In this manuscript, Lu et al. describe that Vangl2 expression is upregulated in human sepsis-associated PBMCs and that Vangl2 mitigates experimental sepsis in mice by negatively regulating p65/NF-κB signaling in myeloid cells. Their mechanistic studies further revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to promote K63-linked poly-ubiquitination of p65. Vangl2 also facilitated the recognition of ubiquitinated p65 by the cargo receptor NDP52. These molecular processes caused selective autophagic degradation of p65. Indeed, abrogation of PDLIM2 or NDP52 functions rescued p65 from autophagic degradation, leading to extended p65/NF-κB activity in myeloid cells. Overall, the manuscript presents convincing evidence for novel Vangl2-mediated control of inflammatory p65/NF-kB activity. The proposed pathway may expand interventional opportunities restraining aberrant p65/NF-kB activity in human ailments.

      IKK is known to mediate p65 phosphorylation, which instructs NF-kB transcriptional activity. In this manuscript, Vangl2 deficiency led to an increased accumulation of phosphorylated p65 and IKK also at 30 minutes post-LPS stimulation; however, autophagic degradation of p-p65 may not have been initiated at this early time point. Therefore, this set of data put forward the exciting possibility that Vangl2 could also be regulating the immediate early phase of inflammatory response involving the IKK-p65 axis - a proposition that may be tested in future studies.

      We appreciate the reviewer’s comments on our manuscript, and we have added the discussion about IKK-p65 axis in revised version. (Page 15, lines 467-474)

      Responses to Reviewer #3:

      Reviewer #3: Lu et al. describe Vangl2 as a negative regulator of inflammation in myeloid cells. The primary mechanism appears to be through binding p65 and promoting its degradation, albeit in an unusual autolysosome/autophagy dependent manner. Overall, these findings are novel, valuable and the crosstalk of PCP pathway protein Vangl2 with NF-kappaB is of interest. While generally solid, some concerns still remain about the rigor and conclusions drawn.

      Comments on the revised version:

      (1) Lu et al. address my comments through responses and new experimental data. However, some of the explanations provided are inadequate.

      However, in response to my enquiry regarding directly exploring PCP effects, the authors simply assert "Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NFkB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension."

      I do not agree that the use of autophagy inhibitors and autophagy-deficient cells can rule out the contributions of PCP or any other pathways. Only experimentally inhibiting the pathway(s) with adequate demonstration of target inhibition/abolition of well-known effector function and documenting unaltered p65 regulation under these conditions can be considered proof. Autophagy inhibitors and autophagy-deficient cells only prove that this particular pathway is necessary. Nonetheless, I do not want to dwell on proving a negative and agree that Vangl2 is a novel regulator of p65 through its role in promoting p65 degradation. The inclusion of a statement discussing the limitations of their approach would have sufficed. The response from the authors could have been better.

      We thank the reviewer for helping us improve the quality of the manuscript. We provided new data and revised the Discussion as suggested.

      To ascertain whether Vangl2 degrades p65 through a selective autophagic pathway or the PCP pathway, 293T cells were transfected with p65, together with or without the Vangl2 plasmids, and treated with different pharmacological inhibitors. We found the degradation of p65 induced by Vangl2 was blocked by autolysosome inhibitor (CQ), but not by the JNK inhibitor (SP600125) or Wnt/β-catenin inhibitor (FH535) (New Figure. 1). These data suggest that Vangl2 primarily degrades p65 through a selective autophagic pathway, rather than through the JNK or Wnt signaling pathway. Nevertheless, additional pathway inhibitions, such as those of the HH/GLI and Fat-Dachsous pathways, should also be employed to further elucidate the function of Vangl2 in p65 degradation. As suggested, we have added a statement about the limitation of the approach in the discussion (Page 12, lines 378-385).

      Author response image 1.

      Vangl2 degrades p65 through a selective autophagic pathway, but not by the PCP pathway. HEK293T cells were transfected with Flag-p65 and HA-Vangl2 plasmids, and treated with DMSO, CQ (50 mM) for 6 h, SP600125 (20 mM) for 1 h or FH535 (30 mM) for 6 h. The cell lysates were analyzed by immunoblot.

      (2) I am also not satisfied with the explanation that "immune cells represent a minor fraction of the lungs and liver". There are lots of resident immune cells in the lungs and liver (alveolar macrophages in the lung and Kuppfer cells in the liver). For example, it may be so that Vangl2 is important in monocytes and not in the resident population. This might be a potential explanation. But this is not explored. The restricted tissue-specificity of the interaction between two ubiquitously present proteins is still a challenge to understand. The response from the authors is not satisfactory. There is plenty of Vangl2 in the liver in their western blot.

      We thank the reviewer for this question. We added this explanation in the Discussion. (Page 13, lines 398-404)

      (3) I had also simply pointed out PMID: 34214490 with reference to the findings described in the manuscript. There were no suggestions of contradiction. In fact, I would refer to the publication in discussion to support the findings and stress the novelty. The response from the authors could have been better.

      Thank you for the reviewer's insightful comments. We have modified this discussion as suggested. (Page 13, lines 410-415; Page 14, lines 419-421)

      (4) The response to my enquiry regarding homo- or heterozygosity is unsupported by any reference or data.

      As suggested, we provided the data that only Vangl2 deficient homozygous showed inhibition of the activation of NF-kB in New Figure. 2.

      Author response image 2.

      Vangl2 deficiency promotes NF-kB activation. (A) The survival rates of WT, Vangl2ΔM/ΔM and Vangl2ΔM/WT mice treated with high-dosage of LPS (30 mg/kg, i.p.) (n≥4). (B) IL-6 and TNF-a secretion by WT and Vangl2-deficient BMDMs treated with LPS for 6 h was measured by ELISA. IL-1β secretion by WT, Vangl2ΔM/ΔM and Vangl2ΔM/WT BMDMs treated with LPS for 6 h and ATP for 30 min was measured by ELISA.

      (5) The listing of 8 patients and healthy controls are also appreciated. The body temperature of #6 doesn't fall in the <36 or >38 degree C SIRS criteria. The inclusion of CRP, PCT, heart rate and respiratory rate, and other lab values would have further improved the inclusion criteria. Moreover, it is difficult to understand why there are 16 value points for healthy and sepsis cohorts in Fig 1 when there are 8 patients.

      We thank the reviewer for this valuable suggestion. We are sorry for our mistake that we entered data from two repeated experiments in Figure. 1 A and we have revised this data in the updated version (Figure. 1 A, Pages 12 Lines 146). As suggested, we have added CRP, WBC and heart rate in sepsis patients’ information. (Supplementary Materials and Methods)

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The proposition that Vangl2 may target additional mediators of inflammation could be indicated in the text.

      We thank the reviewer for this valuable suggestion. We had added discussion in modified version. (Page 15, lines 467-474)

      Reviewer #3 (Recommendations For The Authors):

      It is advised that some of the deficiencies pointed out by Reviewer #3 are textually addressed. Additionally, there could be some inconsistency in the number of healthy controls and patients (see Fig S1A and FIg 1A and Supplementary table, also see comments from Reviewer #3) - this should be carefully scrutinised and revised, if necessary.

      We thank the reviewer for this valuable suggestion. We are sorry for our mistake that we entered data from two repeated experiments in Figure. 1 A and we have revised this data in the updated version (Figure. 1 A, Pages 12 Lines 146).

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife assessment:

      In this useful study, the authors analyze droplet size distributions of multiple protein condensates and their fit to a scaling ansatz, highlighting that they exhibit features of first- and second-order phase transitions. The experimental evidence is still incomplete as the measurements were apparently done only at one time point, neglecting the possibility that droplet size distribution can evolve with time. The text would benefit from a connection to and contextualization with the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community and that emphasizes "liquid-gas" criticality. 

      We have now carried out new experiments at multiple time points to establish that the droplet size distributions are stationary below the critical concentration. We have also addressed the comments made by the reviewers about the nature of the phase transition.

      Our analysis does not depend on a specific hypothesis on the nature of the phase transition, whether it be percolation or a gas-liquid critical transition. The scaling that we observed is an emergent property that is independent from the possible theoretical models used to describe the phase transition. In fact, our scaling analysis indicates that any theoretical model proposed for protein phase separation should predict the critical exponents that we reported. 

      Reviewer #1

      The authors analyse droplet size distributions of multiple protein condensates and fit to a scaling ansatz to highlight that they exhibit features of first-order and second-order phase transitions. While the experimental evidence is solid, the text lacks connection and contextualization to the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community. The evidence supports the percolation and phase separation model rather than being close to a true critical point in the liquid-gas phase space. Overall, the work is useful to the community.

      We are grateful to the reviewer for these positive comments. We would like to emphasises that our contribution is not to propose a theoretical model, but rather to report a scaling behaviour in the experimentally measured droplet size distributions. The main implication of our work is that any theoretical model should predict the scaling exponents that we derived from the experimental measurements.

      Strengths: 

      The experimental analysis of distinct protein condensates is very well done and the reported exponents/scaling framework provides a clear framework to help the community deconvolve signatures of percolation in condensates. 

      Weaknesses: 

      The principal concern this reviewer has is that the reviewers adopt a framing in this paper to present a discovery of second-order features and connections to criticality - however, they ignore/miss the connections to percolation (a well-understood second-order transition that is expected to play a major role in protein condensates). I believe this needs to be addressed and the paper suitably revised to help connect with these expectations. 

      The scaling that we found is not characteristic standard percolation, since the exponents that we obtained (a=0 and f=1) are different from those of percolation (a=1.19 and f=2.21). This difference indicates that protein phase separation is not in the same universality class of standard percolation. Further studies will be required to understand whether theoretical models based on percolation could predict the observed critical exponents.

      - Protein condensates have been increasingly understood to be described as fluids whose assembly is driven by a connection of density (phase separation, first-order) and connectivity (percolation, second-order) transitions. This has been long known in the polymer community (Flory, Stockmayer, Tanaka, Rubinstein, Semenov, and others) and recently repopularized in the condensate community (by Pappu and Mittag, in particular, amongst others). The authors make no connections to any of these frameworks - which actually seem to be the essence of what they are describing. 

      As mentioned above, our purpose was neither to support an existing theoretical model, nor to propose a new one. Rather, we have reported a scaling behaviour and scaling exponents not noted before. Further studies will be required to establish whether existing theoretical models could account for this scaling behaviour.

      - Percolation theory, which has been around for more than half a century, has clear-cut scaling laws that have essentially similar forms to the ansatz adopted by the authors, and the commonalities/differences are not discussed by the authors - this is essential since this provides a physical basis for their ansatz rather than an arbitrary mathematical formulation. In particular, percolation models connect size distribution exponents to factors like dimensionality, valence, etc. and if these connections can be made with this data, that would be very powerful. 

      The scaling ansatz that we are using is commonly adopted in studies of critical phenomena, and it is not specific to percolation. The scaling exponents depends only on very few attributes like dimensionality, symmetries and if interactions are short or long range. These attributes determine the universality class. As such, scaling does not link with molecular determinants, but can distinguish different classes.

      - The connections between spinodal decomposition and second-order phase transitions are very confusing. Spindal decomposition happens when the barriers for first-order phase transitions are zero and systems can phase separate without crossing nucleation barriers. Further, the "criticality" discussed in the paper is confusing since it more likely refers to a percolation threshold and much less likely to a "critical temperature" (Tc -where spinodal and binodals become identical). I would recommend reframing this argument. 

      We cannot refer to percolation threshold as our model is not readily compatible with it. We elaborated and better explained the differences between these models.

      It's unlikely, in this reviewer's opinion, that the authors are actually discussing a "first-order" liquid-gas critical point - because saturation concentrations of these proteins can be much higher with temperature and the critical point would thus likely be at much higher concentrations (and ofc temperature). Further, the scaling exponents don't fall into that class naturally. However, if the authors disagree, I would appreciate clear quantitative reasons (including through the scaling exponents in that universality class) and be happy to be convinced to change my mind. As provided, the data does not support this model. 

      We have now clarified in the manuscript that we do not discuss the liquid-gas critical point.

      Reviewer #2

      This is a potentially interesting study addressing a possible scale-invariant log-normal characteristic of droplet size distribution in the phase separation behavior of biomolecular condensates. Some of the data presented are valuable and intriguing. However, as it stands, the validity and utility of this study are uncertain because there are serious deficiencies in the execution and presentation of the authors' results. Many of these shortcomings are fundamental, including a lack of clarity in the basic conceptual framework of the study, insufficient justification of the experimental setup, less-than-conclusive experimental evidence, and inadequate discussion of implications of the authors' findings to future experimental and theoretical studies of biomolecular condensates. Accordingly, this reviewer considers that the manuscript should undergo a major revision to address the following. In particular, the discussion should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      We thank the reviewer for the helpful comments. In the revised version of the manuscript we clarified that we aimed to use a well-established tool – the scaling analysis – to study phase transition and applied to the protein condensation process. This approach offers insight into a universal aspect of protein phase separation, and also provides a practical approach to determine the phase boundary. The observed fat-tailed distribution of protein droplet sizes is not what is normally observed in more standard phase separation systems in the subsaturated phase. Our contribution is not to propose a theoretical model, but rather to report the observation of a scaling behaviour. 

      (1) The theoretical analysis in this study is based on experimental data on condensed droplet size distributions for FUS and α-synuclein. The size data for FUS droplet is indirect as it relies on the assumption that FUS droplet diameter is proportional to fluorescence intensity of labeled FUS (page 10 of manuscript), with fluorescence data adopted from a previously published work by another group (Kar et al. & Pappu, ref.27). Because fluorescence of a droplet is expected to be dependent upon the condensed-phase concentration of FUS, this proportional relationship, even if it holds, must also be modulated by FUS concentration in the droplet. Moreover, why should fluorescence be proportional to diameter but not the cross-sectional area or volume of the FUS droplet, which would be more intuitive? These issues should be clarified. A new measure by microscopy is used to determine the size distribution of condensed α-synuclein; but no microscopy image is shown. It is of critical importance that such raw data (for example microscopy images) be presented for the completeness and reproducibility of the experiment because the entire study relies on the soundness of these experimental measurements. 

      As we mentioned in the article, for the scaling analysis, the droplet dimensions could be assessed in 1D (length), 2D (area) or 3D (volume). For the FUS experiments, we used the data as the authors provided in the original publication (PNAS 2022). For alpha-synuclein, we provided the data in the article. 

      (2) Despite the authors' claim of a universal scaling relationship, the log-log scatter plots in Figure 1 (page 15 of the manuscript) exhibit significant deviations from linearity at low protein concentrations (ρ→0). Given this fact, is universal scaling really valid? Discussion of this behavior is conspicuously absent (except the statement that these data points are excluded in the fit). In any case, the possible origins of these deviations should be thoroughly discussed so that the regime of universal scaling can be properly delineated. 

      In general, one would expect the scaling ansatz to be valid close to the phase boundary. It is the feature of the ansatz, that further away from the boundary, deviations are expected because of the decreasing relevance of critical phenomena.

      (3) Droplet size distribution most likely depends on the time duration after the preparation of the sample. For α-synuclein, "liquid droplet size characterisation images were captured 10 minutes post-liquid droplet formation" (page 9 of the manuscript). Why 10 minutes? Have the authors tried imaging at different time points and, if so, do the distributions at different time points remain essentially the same? If they are different, what is the criterion for focusing only on a particular time point? Information related to these questions should be provided. 

      We have now determined the droplet size distribution of alpha-synuclein at different time points, finding that they are not dependent on time within experimental uncertainties (Figure 6 in the revised manuscript).

      (4) At least two well-known mechanisms can lead to the time-dependent distribution of liquid droplet sizes: (i) coalescence of droplets in spatial proximity to form a larger droplet, and (ii) Ostwald ripening, i.e., formation of larger droplets concomitant with the dissolution of smaller droplets without fusion of droplets. The implications of these mechanisms on the authors' droplet size distributions should be addressed. Indeed, maintaining a size distribution against these mechanisms in vivo often requires active suppression [Bressloff, Phys Rev E 101, 042804 (2020)] with possible involvement of chemical reactions [Kirschbaum & Zwicker, J R Soc Interface 18, 20210255 (2021)]. These considerations are central to the basic rationale of this study and therefore should be carefully tackled. 

      These two mechanism of growth are relevant above the critical concentration. Below the critical concentration, which is the regime that we investigated in our work, there is no need of active suppression.

      (5) If coalescence and/or Ostwald ripening do occur, given sufficient time after sample preparation, the condensed phase may become a single large "droplet" or a single liquid layer. Does this occur in the authors' experiments? 

      As we are below the critical concentration, this is unlikely to occur, as indeed supported by the experiments mentioned at point (3). 

      (6) It is unclear whether the authors aim to address the kinetic phenomenon of liquid droplet formation and evolution or equilibrium properties. The two types of phenomena appear to be conflated in the authors' narrative. Clarification is needed. If this work aims to address timeindependent (or infinite-time) equilibrium properties, how are they expected to be related to droplet size distribution, which most likely is time-dependent? 

      Our analysis focuses on the equilibrium properties of the droplet size distribution below the critical concentration, and it should guide the proposal of a theoretical model that explains the emergence of scaling. In the introductory part of our manuscript, we proposed a possible scenario that tries to extend the Flory-Huggins’s theory to predict a scaling behaviour appropriate to a critical transition. Other scenarios are possible, and our result along with further experiments are needed to arrive at a deeper understanding of protein aggregation.

      (7) The relationship between the potentially time-dependent droplet size distribution and equilibrium properties of ρt and ρc (transition and critical concentrations, respectively) should be better spelled out. An added illustrative figure will be helpful. 

      We are addressing equilibrium properties, not kinetic ones. See also the answers to point 6.

      (8) The authors comment that their findings appear to be inconsistent with Flory-Huggins theory because Flory-Huggins "characterizes droplet formation as a consequence of nucleation ..." (page 8 of the manuscript). Here, three issues need detailed clarification: (i) In what way does Flory-Huggins mandate nucleation? (ii) Why are the findings of apparent scale invariance inconsistent with nucleation? (iii) If liquid droplet formations do not arise from nucleation, what physical mechanism(s) is (are) envisioned by the authors to be underpinning the formation of condensed liquid droplets in protein phase separation? 

      We do agree that the Flory-Huggins theory does not mandate nucleation above the spinodal line. However, we are addressing the equilibrium properties below the critical concentration, so the stable phase is the dilute phase, and there is no nucleation.

      (9) Are any of the authors' findings related to finite-system effects of phase separation [see, e.g., Nilsson & Irbäck, Phys Rev E 101, 022413 (2020)]?  

      Our experimental system is macroscopic, so we would not expect finite size effects.

      (10) Since the authors are using their observation of an apparent scale-invariant droplet size distribution to evaluate phase separation theory, it is important to clarify whether their findings provide any constraint on the shape of coexistence curves (phase diagrams). 

      We are only reporting the phenomenological observation of a scaling behaviour, so we may not speculate at this stage on the constraints of the coexistence curves. This is indeed an exciting opportunity for future studies.

      (11) More specifically, do the authors' findings suggest that the phase diagrams predicted by Flory-Huggins are invalid? Or, are they suggesting that even if the phase diagrams predicted by Flory-Huggins are empirically correct (if verified by experimental testing), they are underpinned by a free energy function different from that of Flory-Huggins? It is important to answer this question to clarify the implications of the authors' findings on equilibrium phase behaviors and the falsifiability of the implications. 

      As mentioned above, our main conclusion is that the droplet size distribution follows a scaling behaviour.  Our contribution is not to propose a theoretical model, but rather to propose a scaling behaviour that should be accounted for by existing of future theoretical models.

      (12) How about the implications of the authors' findings on other theories of protein phase separation that are based on interactions that are different from the short spatial range interactions treated by Flory-Huggins? For instance, it has been observed that whereas the Flory-Huggins-predicted phase diagrams always convex upward, phase diagrams for charged intrinsically disordered proteins with long spatial range Coulomb interactions exhibit a region that concave upward [Das et al., Phys Chem Chem Phys 20, 28558-28574 (2018)]. Can information be provided by the authors' findings regarding apparent scale-invariant droplet size distribution on the underlying interaction driving the protein molecules toward phase separation? 

      This is an interesting point for future studies about the type of interactions that give rise to the observed scaling behaviour.

      (13) Table S1 (page 4) and Table S2 (page 7) are mentioned in the text but these tables are not in the submitted files. 

      We have added the Supplementary Tables as well as the source files for the figures.

      (14) The two systems studied (FUS and α-synuclein) have a single intrinsically disordered protein (IDP) component. It is not clear if the authors expect their claimed scaling relation to be applicable to systems with multiple IDP components and if so, why.

      From the data that we have currently analysed, we feel that we may not speculate on this interesting point, leaving it to future studies.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses. Compared to an earlier version of the paper, the strength of evidence has improved but it is still partially incomplete due to a few key missing experiments and controls.

      We would like to thank the editorial team for their positive comments and constructive suggestions on improving our manuscript. We have made further improvements based on the valuable suggestions of the reviewers, and we are pleased to send you the revised manuscript now. After revising the manuscript and further supplementing with experiments, we think that our existing data can support our claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts. Additionally, it is noted that the main claims put forth in the manuscript are only partially supported by the data presented.

      After meticulous revisions of the manuscript, including adjustments to the title, abstract, results, and discussion, the main claim of our study now is the arm race between the MDA5 receptor and SCRV virus in a lower vertebrate fish, M. miiuy. This mainly includes two parts: Firstly, the MDA5 of M. miiuy can recognize virus invasion and initiate host immune response by recognizing the triphosphate structure of SCRV. Secondly, as an adversarial strategy, 5’ppp-RNA SCRV virus can utilize the m6A mechanism to degrade MDA5 in M. miiuy. Based on the reviewer's suggestions, we have further supplemented the critical experiments (Figure 3F-3G, Figure 4D, Figure 5G) and provided a more detailed and accurate explanation of the experimental conclusions, we believe that our existing manuscript can support our main claims. In addition, because virus-host coevolution complicates the derivation of universal conclusions, we will further expand our insights in future research.

      Reviewer #2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleost fish miiuy croaker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in m.miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      One critical caveat in this study is that it does not address whether ppp-SCRV RNA induces IRF3-dimerization and type I IFN induction in an MDA5 dependent manner. The data demonstrate that mmiMDA5 can bind to triphosphorylated RNA (Fig. 4D). In addition, triphosphorylated RNA can dimerize IRF3 (4C). However, a key experiment that ties these two observations together is missing.

      Specifically, although Fig. 4C demonstrates that 5'ppp-SCRV RNA induces dimerization (unlike its dephosphorylated or capped derivatives), this does not proof that this happens in an MDA5-dependent manner. This experiment should have been done in WT and siMDA5 MKC cells side-by-side to demonstrate that the IRF3 dimerization that is observed here is mediated by MDA5 and not by another (unknown) protein. The same holds true for Fig. 4J.

      Thank you for the referee's professional suggestions. In fact, we have transfected SCRV RNA into WT and si-MDA5 MKC cells, and subsequently assessed the dimerization of IRF3 and the IFN response (Figure 2P-2Q). The results indicated that knockdown of MDA5 prevents immune activation of SCRV RNA. However, considering the potential for SCRV RNA to activate immunity independent of the triphosphate structure, this experimental observation does not comprehensively establish the MDA5-dependent induction of IRF3 dimer by 5’ppp-RNA. Accordingly, in accordance with the referee's recommendation, we proceeded to investigate the inducible activity of 5'ppp-SCRV on IRF3 dimerization in WT and si-MDA5 MKC cells, revealing that 5'ppp-SCRV indeed elicits immunity in an MDA5-dependent manner (Figure 4D). Additionally, poly(I:C)-HMW, a known ligand for MDA5, demonstrated a residual, albeit attenuated, activation of IRF3 following MDA5 knockdown, potentially attributed to its capacity to stimulate immunity through alternative pathways such as TLR3.

      - Fig 1C-D: these experiments are not sufficiently convincing, i.e. the difference in IRF3 dimerization between VSV-RNA and VSV-RNA+CIAP transfection is minimal.

      We have reconstituted the necessary materials and repeated the pertinent experiments depicted in Fig 1C-1D. The results demonstrate that SCRV-RNA+CIAP and VSV-RNA+CIAP exhibit a mitigating effect on the induction activity of SCRV-RNA and VSV-RNA on IRF3 dimerization, albeit without complete elimination (Figure 1C and 1D). These findings suggest the presence of receptors within M. miiuy and G. gallus capable of recognizing the viral triphosphate structure; however, it is worth noting that RNA derived from SCRV and VSV viruses does not exclusively depend on the triphosphate structure to activate the host's antiviral response.

      Fig. 2N and 2O: why did the authors decide to use overexpression of MDA5 to assess the impact of STING on MDA5-mediated IFN induction? This should have been done in cells transfected with SCRV or polyIC (as in 2D-G) or in infected cells (as in 2H-K). In addition, it is a pity that the authors did not include an siMAVS condition alongside siSTING, to investigate the relative contribution of MAVS versus STING to the MDA5-mediated IFN response. Panel O suggests that the IFN response is completely dependent on STING, which is hard to envision.

      In our previous laboratory investigations, we have substantiated the induction effect of STING on IFN under SCRV infection or poly(I:C) stimulation, as documented in the relevant literature (10.1007/s11427-020-1789-5), which we have referenced in our manuscript (lines 177-178). While we did assess the impact of STING on MDA5-mediated IFN induction in SCRV-infected cells, as indicated in the figure legends, we have revised Figure 2N-2O for improved clarity, and similarly, Figure 1H-1I has also been updated. Furthermore, considering that RNA virus infection can activate the cGAS/STING axis (10.3389/fcimb.2023.1172739) and the significant role of MAVS in sensing RNA virus invasion in the NLR pathway (10.1038/ni.1782), it is challenging to ascertain the respective contributions of STING and MAVS to the immune signaling cascade mediated by MDA5 during RNA virus infection. We intend to explore this aspect further in future research endeavors.

      Fig. 3F and 3G: where are the mock-transfected/infected conditions? Given that ectopic expression of hMDA5 is known to cause autoactivation of the IFN pathway, the baseline ISG levels should be shown (ie. In absence of a stimulus or infection). Normalization of the data does not reveal whether this is the case and is therefore misleading.

      Based on the reviewer's suggestions, we have rerun the experiment. We examined the effects of MDA5 and MDA5-ΔRD on antiviral factors in both uninfected, SCRV-infected, and poly(I:C)-HMW-stimulated MKC cells. Results showed that overexpression of both MDA5 and MDA5-ΔRD stimulated the expression of antiviral genes. However, when cells were infected or stimulated with SCRV or poly(I:C)-HMW, only the overexpression of MDA5, not MDA5-ΔRD, significantly increased the expression of antiviral genes (Figure 3F-3I).

      Fig. 4F and 4G: can the authors please indicate in the figure which area of the gel is relevant here? The band that runs halfway the gel? If so, the effects described in the text are not supported by the data (i.e. the 5'OH-SCRV and 5'pppGG-SCRV appear to compete with Bio-5'ppp-SCRV as well as 5'ppp-SCRV).

      Apologies for any confusion. The relevant areas in the gel pertaining to the experimental findings were denoted with asterisks and elaborated upon in the figure legends (Figure 4G, 4H, and 4M). The findings indicated that 5'ppp-SCRV, in contrast to 5'OH-SCRV and 5'pppGG-SCRV, demonstrated the ability to compete with bio-5'ppp-SCRV.

      My concerns about Fig. 5 remain unaltered. The fact that MDA5 is an ISG explains its increased expression and increased methylation pattern. The authors should at the very least mention in their text that MDA5 is an ISG and that their observations may be partially explained by this fact.

      First, as our m6A change analysis pipeline controls for changes in gene expression, these data should represent true changes in m6A modification rather than changes in the expression of m6A-modified transcripts (10.1038/s41598-020-63355-3). Similar studies demonstrated that m6A modification in RIOK3 and CIRBP mRNAs are altered following Flaviviridae infection (10.1016/j.molcel.2019.11.007). The specific calculation method is as follows: relative m6A level for each transcript was calculated as the percent of input in each condition normalized to that of the respective positive control spike-in. Fold change of enrichment was calculated with mock samples normalized to 1. Therefore, changes in the expression level of MDA5 can partially explain the increase in m6A modification on all MDA5 mRNA in cells, but it cannot indicate changes in m6A modification on each mDA5 transcript. We have supplemented the calculation method process in the manuscript and cited relevant literature (Lines 606-608). In addition, we have elaborated on the fact that MDA5 is an ISG gene in the experimental results (lines 260-261), and emphasized its compatibility with enhanced m6A modification of MDA5 in the discussion section (lines 405-409).

      Reviewer #3 (Public Review):

      In this manuscript, the authors explored the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in the Miiuy croaker. They found that MDA5 can serve as a substitute for RIG-I in detecting 5'ppp-RNA of Siniperca cheilinus rhabdovirus (SCRV) when RIG-I is absent in Miiuy croaker. Furthermore, they observed MDA5's recognition of 5'ppp-RNA in chickens (Gallus gallus), a species lacking RIG-I. Additionally, the authors documented that MDA5's functionality can be compromised by m6A-mediated methylation and degradation of MDA5 mRNA, orchestrated by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker during SCRV infection. This impairment compromises the innate antiviral immunity of fish, facilitating SCRV's immune evasion. These findings offer valuable insights into the adaptation and functional diversity of innate antiviral mechanisms in vertebrates.

      We extend our sincere appreciation for your professional comments and insightful suggestions on our manuscript, as they have significantly contributed to enhancing its quality.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The interpretation of Figures 1H and I, along with the captions, seems unclear. Particularly, understanding the meaning of the X-axis in Figure I is challenging. Additionally, the designation of "H2O = 1" on the Y-axis in Figure 1E lacks clarity. It would be helpful if the author could revise and clarify these figures for better comprehension.

      We appreciate your reminder and have corrected and clarified these figures and figure legends (lines 768-772). We have replaced the Y-axis of Figure 1I with "Relative mRNA expression" instead of " Relative IFN-1 expression" (Figure 1I). In addition, we have added an explanation of "H2O=1" in the legend of Figure 1E.

      (2) The interpretation of Figure 5 in section 2.5 seems incomplete. The author mentioned that both m6A levels and MDA5 expression levels are increased (lines 256-257), prompting questions about the relationship between m6A and MDA5 expression. If higher m6A levels typically lead to MDA5 mRNA instability and lower MDA5 expression, observing both increasing simultaneously appears contradictory. Considering the dynamic changes shown in Figure 5, it would be more appropriate to propose an alteration in both m6A levels and MDA5 expression levels. Given the fluctuating nature of these changes, definitively labeling them as solely "increased" is challenging. Therefore, offering a nuanced interpretation of the results and clarifying this aspect would bolster the study's conclusions.

      While changes in m6A modification and the expression of m6A-modified transcripts are biologically relevant, identifying bona fide m6A alterations during viral infection will allow us to understand how m6A modification of cellular mRNA is regulated. As our m6A change analysis pipeline controls for changes in gene expression, these data should represent true changes in m6A modification rather than changes in the expression of m6A-modified transcripts (10.1038/s41598-020-63355-3). Similar studies demonstrated that m6A modification in RIOK3 and CIRBP mRNAs are altered following Flaviviridae infection (10.1016/j.molcel.2019.11.007). The specific calculation method is as follows: relative m6A level for each transcript was calculated as the percent of input in each condition normalized to that of the respective positive control spike-in. Fold change of enrichment was calculated with mock samples normalized to 1. Therefore, the upregulation of MDA5 expression can partially explain the increase in m6A modification on all MDA5 mRNA in cells, but it cannot indicate changes in m6A modification on each mDA5 transcript. We have supplemented the calculation method process in the manuscript and cited relevant literature. I hope to receive your understanding.

      In addition, although higher m6A levels often lead to unstable MDA5 mRNA and lower MDA5 expression, SCRV can affect MDA5 expression through multiple pathways. For example, since MDA5 is an interferon-stimulated gene, the infection of SCRV virus can cause strong expression of interferon and indirectly induce high-level expression of MDA5. Therefore, the expression of MDA5 is not contradictory to the simultaneous increase in MDA5 modification (24 h). In order to further enhance our experimental conclusions, we supplemented the dual fluorescence experiment. The results indicate that, the infection of SCRV can inhibit the fluorescence activity of MDA5-exon1 reporter plasmids containing m6A sites but not including the promoter sequence of the MDA5 gene, and this inhibitory effect can be counteracted by cycloleucine (CL, an amino acid analogue that can inhibit m6A modification) (Figure 5G). This further indicates that SCRV can reduce the expression of MDA5 through the m6A pathway.

      Finally, in light of the fluctuations in MDA5 expression levels, we have changed the subheadings of Results 2.5 section and provided a more comprehensive and precise elucidation of the experimental outcomes. We are grateful for your valuable feedback.

      (3) In the discussion section, it would indeed be advantageous for the author to explore the novelty of this work more comprehensively, moving beyond merely acknowledging the widespread loss of RIG-I and suggesting MDA5 as a compensatory mechanism. Considering the well-established roles of MDA5 and m6A in host-virus interactions, the findings of this study may seem familiar in light of previous research. To enhance the discussion, it would be valuable for the author to delve into the implications of this evolutionary model. For instance, does the compensation or loss of RIG-I impact a species' susceptibility to specific types of viruses? Exploring such questions would provide insight into the broader significance of this compensation model and its potential effects on host-virus interactions, thus adding depth to the study's contribution.

      We appreciate the expert advice provided by the referee. In response, we have expanded our discussion in the relevant section, addressing the potential influence of RIG-I deficiency and MDA5 compensation on the antiviral immune system in vertebrates (lines 371-376). Furthermore, we underscore the significance of exploring the impact of SCRV infection on MDA5 m6A modification, considering its compatibility with MDA5 as an ISG gene, in elucidating the host response to viral infection (lines 405-409).

      (4) To improve the manuscript, it would be beneficial if the editors could aid the author in refining the language. Many descriptions in the article are overly redundant, and there should be appropriate differentiation between experimental methods and results.

      We appreciate the reviewer’s comment. We have carefully revised the manuscript and removed redundant descriptions in the experimental results and methods.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed all of my concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study seeks to establish accurate computational models to explore the role of hydrodynamic interactions on energy savings and spatial patterns in fish schools. Specifically, the authors consider a system of (one degree-of-freedom) flapping airfoils that passively position themselves with respect to the streamwise direction, while oscillating at the same frequency and amplitude, with a given phase lag and at a constant cross-stream distance. By parametrically varying the phase lag and the cross-stream distance, they systematically explore the stability and energy costs of emergent configurations. Computational findings are leveraged to distill insights into universal relationships and clarify the role of the wake of the leading foil.

      We would like to thank the referee for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      Strengths:

      (1) The use of multiple computational models (computational fluid dynamics, CFD, for full Navier-Stokes equations and computationally efficient inviscid vortex sheet, VS, model) offers an extra degree of reliability of the observed findings and backing to the use of simplified models for future research in more complex settings.

      (2) The systematic assessment of the stability and energy savings in multiple configurations of pairs and larger ensembles of flapping foils is an important addition to the literature.

      (3) The discovery of a linear phase-distance relationship in the formation attained by pairs of flapping foils is a significant contribution, which helps compare different experimental observations in the literature.

      (4) The observation of a critical size effect for in-line formations of larger, above which cohesion and energetic benefits are lost at once, is a new discovery in the field.

      Thank you for this list of strength – we are delighted that these ideas were clearly communicated in our manuscript.

      Note that Newbolt et al. PNAS, 2019 reported distance as a function of phase for pairs of flapping hydrofoils, and Li et al, Nat. Comm., 2020 also reported phase-distance relationship in robotic and biological fish (calling it Vortex Phase Matching). We compiled their results, together with our and other numerical and experimental results, showing that the linear distance-phase relationship is universal.

      Weaknesses:

      (1) The extent to which observations on one-degree-of-freedom flapping foils could translate to real fish schools is presently unclear so some of the conclusions on live fish schools are likely to be overstated and would benefit from some more biological framing.

      Thank you for bringing up this point. Indeed, flapping foils that are free to translate in both the x- and y-directions and rotate in the x-y plane could drift apart in the y-direction. However, this drift occurs at a longer time scale than the forward swimming motion; it is much slower. For this reason, we feel justified to ignore it for the purpose of this study, especially that the pairwise equilibria in the swimming x-direction are reached at a faster time scale.

      Below, we include two snapshots taken from published work from the group of Petros Koumoutsakos (Gazzola et al, SIAM 2014). The figures show, respectively, a pair and a group of five undulating swimmers, free to move and rotate in the x-y plane. The evolution of the two and five swimmers is computed in the absence of any control. The lateral drift is clearly sub-dominant to the forward motion. Similar results were reported in Verma et al, PNAS 2018.

      These results are independent on the details of the flow interactions model. For example, similar lateral drift is observed using the dipole model dipole model (Kanso & Tsang, FDR 2014, Tsang & Kanso, JNLS 2023).

      Another reason why we feel justified to ignore these additional degrees of freedom is the following: we assume a live fish or robotic vehicle would have feedback control mechanisms that correct for such drift. Given that it is a slowly-growing drift, we hypothesize that the organism or robot would have sufficient time to respond and correct its course.

      Indeed, in Zhu et al. 2022, an RL controller, which drives an individual fish-like swimmer to swim at a given speed and direction, when applied to pairs of swimmers, resulted in the pair "passively" forming a stable school without any additional information about each other.

      We edited the main manuscript in page 4 of the manuscript to include reference to the work cited here and to explain the reasons for ignoring the lateral drift.

      Citations:  

      Gazzola, M., Hejazialhosseini, B., & Koumoutsakos, P. (2014). Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmersSIAM Journal on Scientific Computing36(3), B622-B639. DOI: https://doi.org/10.1137/130943078

      Verma, S., Novati, G., & Koumoutsakos, P. (2018). Efficient collective swimming by harnessing vortices through deep reinforcement learningProceedings of the National Academy of Sciences115(23), 5849-5854. DOI: https://doi.org/10.1073/pnas.1800923115

      Tsang, A. C. H. & Kanso, E., (2013). Dipole Interactions in Doubly Periodic DomainsJournal of Nonlinear Science 23 (2013): 971-991. DOI: https://doi.org/10.1007/s00332-013-9174-5

      Kanso, E., & Tsang, A. C. H. (2014). Dipole models of self-propelled bodiesFluid Dynamics Research46(6), 061407. DOI: https://doi.org/10.1088/0169-5983/46/6/061407

      Zhu, Y., Pang, J. H., & Tian, F. B. (2022). Stable schooling formations emerge from the combined effect of the active control and passive self-organizationFluids7(1), 41. DOI: https://doi.org/10.3390/fluids7010041

      Author response image 1.

      Antiphase self-propelled anguilliform swimmers. (a) – (d) Wavelet adapted vorticity fields at, respectively, t = T, t = 4T, t = 10T. (e) Absolute normalized velocities |U|/L. (f) Swimmers’ centre of mass trajectories.

      Author response image 2.

      Parallel schooling formation. (a) – (d) wavelet adapted vorticity fields at, respectively, t = T, t = 4T, t = 7T, t = 10T. (e) Absolute normalized velocities |U|/L. (f) Swimmers’ center of mass trajectories.

      (2) The analysis of non-reciprocal coupling is not as novel as the rest of the study and potentially not as convincing due to the chosen linear metric of interaction (that is, the flow agreement).

      We thank the referee for this candid and constructive feedback. In fact, we view this aspect of the study as most “revolutionary” because it provides a novel approach to pre-computing the locations of stable equilibria even without doing expensive all-to-all coupled simulations or experiments.

      Basically, the idea is the following: you give me a flow field, it doesn’t matter how you obtained it, whether from simulations or experimentally, and I can tell you at what locations in this flow field a virtual flapping swimmer would be stable and save hydrodynamic energy!

      In the revised version, we changed page 3 and 7 in main text, and added a new section “Diagnostic tools” in SI to better illustrate this.

      Overall, this is a rigorous effort on a critical topic: findings of the research can offer important insight into the hydrodynamics of fish schooling, stimulating interdisciplinary research at the interface of computational fluid mechanics and biology.

      We thank the referee again for their careful read of the manuscript and their constructive feedback.

      Reviewer #2 (Public Review):

      The document "Mapping spatial patterns to energetic benefits in groups of flow-coupled swimmers" by Heydari et al. uses several types of simulations and models to address aspects of stability of position and power consumption in few-body groups of pitching foils. I think the work has the potential to be a valuable and timely contribution to an important subject area. The supporting evidence is largely quite convincing, though some details could raise questions, and there is room for improvement in the presentation. My recommendations are focused on clarifying the presentation and perhaps spurring the authors to assess additional aspects:

      We would like to thank the referee for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      (1) Why do the authors choose to set the swimmers free only in the propulsion direction? I can understand constraining all the positions/orientations for investigating the resulting forces and power, and I can also understand the value of allowing the bodies to be fully free in x, y, and their orientation angle to see if possible configurations spontaneously emerge from the flow interactions. But why constrain some degrees of freedom and not others? What's the motivation, and what's the relevance to animals, which are fully free?

      We would like to thank the referee for raising this point. It is similar to the point raised above by the first referee. As explained above the reason is the following: in freely-swimming, hydrodynamically-interacting “fish,” the lateral drift is sub-dominant to the forward swimming motion. Therefore, we ignore it in the model. Please see our detailed response above for further clarification, and see changes in page 4 in the main manuscript.

      (2) The model description in Eq. (1) and the surrounding text is confusing. Aren't the authors computing forces via CFD or the VS method and then simply driving the propulsive dynamics according to the net horizontal force? It seems then irrelevant to decompose things into thrust and drag, and it seems irrelevant to claim that the thrust comes from pressure and the drag from viscous effects. The latter claim may in fact be incorrect since the body has a shape and the normal and tangential components of the surface stress along the body may be complex.

      Thank you for pointing this out! It is indeed confusing.

      In the CFD simulations, we are computing the net force in the swimming x-direction direction by integrating using the definition of force density in relation to the stress tensor. There is no ambiguity here.

      In the VS simulations, however, we are computing the net force in the swimming x-direction by integrating the pressure jump across a plate of zero thickness. There is no viscous drag. Viscous drag is added by hand, so-to-speak. This method for adding viscous drag in the context of the VS model is not new, it has been used before in the literature as explained in the SI section “Vortex sheet (VS) model” (pages 30 and 31).

      .

      (3) The parameter taudiss in the VS simulations takes on unusual values such as 2.45T, making it seem like this value is somehow very special, and perhaps 2.44 or 2.46 would lead to significantly different results. If the value is special, the authors should discuss and assess it. Otherwise, I recommend picking a round value, like 2 or 3, which would avoid distraction.

      Response: The choice of dissipation time is both to model viscous effect and reduce computational complexity. Introducing it is indeed introduces forcing to the simulation. Round value, like 2 or 3, is equal to an integer multiple of the flapping period, which is normalized to T=1, Therefore, an integer value of  would cause forcing at the resonant frequency and lead to computational blow up. To avoid this effect, a parameter choice of  = 2.45, 2.44 or 2.46 would be fine and would lead to small perturbation to the overall simulation, compared to no dissipation at all. This effect is studied in detail in the following published work from our group:

      Huang, Y., Ristroph, L., Luhar, M., & Kanso, E. (2018). Bistability in the rotational motion of rigid and flexible flyers. Journal of Fluid Mechanics849, 1043-1067. DOI: https://doi.org/10.1017/jfm.2018.446

      (4) Some of the COT plots/information were difficult to interpret because the correspondence of beneficial with the mathematical sign was changing. For example, DeltaCOT as introduced on p. 5 is such that negative indicates bad energetics as compared to a solo swimmer. But elsewhere, lower or more negative COT is good in terms of savings. Given the many plots, large amounts of data, and many quantities being assessed, the paper needs a highly uniform presentation to aid the reader.

      Thank you for pointing this out! We updated Figures 3,6 as suggested.

      (5) I didn't understand the value of the "flow agreement parameter," and I didn't understand the authors' interpretation of its significance. Firstly, it would help if this and all other quantities were given explicit definitions as complete equations (including normalization). As I understand it, the quantity indicates the match of the flow velocity at some location with the flapping velocity of a "ghost swimmer" at that location. This does not seem to be exactly relevant to the equilibrium locations. In particular, if the match were perfect, then the swimmer would generate no relative flow and thus no thrust, meaning such a location could not be an equilibrium. So, some degree of mismatch seems necessary. I believe such a mismatch is indeed present, but the plots such as those in Figure 4 may disguise the effect. The color bar is saturated to the point of essentially being three tones (blue, white, red), so we cannot see that the observed equilibria are likely between the max and min values of this parameter.

      Thank you for pointing this out! You are correct in your understanding of the flow agreement parameter, but not in your interpretation.

      Basically, “if the match were perfect, then the swimmer would generate no relative flow and thus no thrust,” means that “such a location could not be is an equilibrium.” Let me elaborate. An equilibrium is one at which the net thrust force is zero. The equilibrium is stable if the slope of the thrust force is negative. Ideally, this is what maximizing the flow agreement parameter would produce.

      For example, consider an ideal fluid where the flow velocity is form  in vertical direction. Consider a “ghost swimmer” heaving at a velocity  . Under this scenario, flow agreement and thrust parameters are

      Let’s now consider a balance of forces on the “ghost swimmer.” The ghost swimmer is in relative equilibrium if and only if:

      It gives us

      We then consider stability at this equilibrium by calculating the derivative of thrust parameter over phase

      The corresponding values at equilibria are

      Thus, when taking the positive which means the equilibria is a stable fixed point. We included this analysis in a new section in the SI page 32.

      (6) More generally, and related to the above, I am favorable towards the authors' attempts to find approximate flow metrics that could be used to predict the equilibrium positions and their stability, but I think the reasoning needs to be more solid. It seems the authors are seeking a parameter that can indicate equilibrium and another that can indicate stability. Can they clearly lay out the motivation behind any proposed metrics, and clearly present complete equations for their definitions? Further, is there a related power metric that can be appropriately defined and which proves to be useful?

      Thank you – these are excellent suggestions. Indeed, we needed to better explain the motivation and equations. Perhaps the main idea for these metrics can be best understood when explained in the context of the simpler particle model, which we now do in the SI and explain the main text.

      (7) Why do the authors not carry out CFD simulations on the larger groups? Some explanations should be given, or some corresponding CFD simulations should be carried out. It would be interesting if CFD simulations were done and included, especially for the in-line case of many swimmers. This is because the results seem to be quite nuanced and dependent on many-body effects beyond nearest-neighbor interactions. It would certainly be comforting to see something similar happen in CFD.

      We are using a open-source version of the Immersed Boundary Method that is not specifically optimized for many interacting swimmers. Therefore, the computational cost of performing CFD simulations for more swimmers is high. Therefore, we used the CFD simulations sporadically with fewer simmers (2 or 3) and we performed systematic simulations in the context of the VS model.

      For the same Reynolds number in Figure 1, we simulated three and four swimmers in CFD: three swimmers forms a stable formation, four swimmers don’t, consistent with the VS model, with the forth swimmer colliding with the third one. Results are included in the SI figure 8 of the main text.

      (8) Related to the above, the authors should discuss seemingly significant differences in their results for long in-line formations as compared to the CFD work of Peng et al. [48]. That work showed apparently stable groups for numbers of swimmers quite larger than that studied here. Why such a qualitatively different result, and how should we interpret these differences regarding the more general issue of the stability of tandem groups?

      Thank you for bringing up this important comparison. Peng et al. [48] (Hydrodynamic schooling of multiple self-propelled flapping plates) studied inline configuration of flapping airfoils at Reynolds number =200. There are several differences between their work and ours. The most important one is that they used a flexible plate, which makes the swimmer more adaptive to changes in the flow field, e.g. changes in tailbeat amplitude and changes in phase along its body and diverts some of the hydrodynamic energy to elastic energy. We edited the main text page 10 at the end of section “Critical size of inline formations beyond which cohesion is lost” to explain this distinction.

      (9) The authors seem to have all the tools needed to address the general question about how dynamically stable configurations relate to those that are energetically optimal. Are stable solutions optimal, or not? This would seem to have very important implications for animal groups, and the work addresses closely related topics but seems to miss the opportunity to give a definitive answer to this big question.

      Indeed, that is exactly the point – in pairwise formations, stable configurations are also energetically optimal! In larger groups, there is no unique stable configuration – each stable configuration is associated with a different degree of energy savings. Interestingly, when exploring various equilibrium configurations in a school of four, we found the diamond formation of D. Weihs, Nature, 1972 to be both stable and most optimal among the configurations we tested. However, claiming this as a global optimum may be misleading – our standpoint is that fish schools are always dynamic and that there are opportunities for energy savings in more than one stable configuration.

      We added a section in new text “Mapping emergent spatial patterns to energetic benefits”, and added a new figure in the maintext (Fig. 10) and a new figure in the SI (Fig. S. 8)

      (10) Time-delay particle model: This model seems to construct a simplified wake flow. But does the constructed flow satisfy basic properties that we demand of any flow, such as being divergence-free? If not, then the formulation may be troublesome.

      The simplified wake flow captures the hydrodynamic trail left by the swimmer in a very simplified manner. In the limit of small amplitude, it should be consistent with the inviscid vortex sheet shed of T. Wu’s waving swimmer model (Wu TY. 1961).

      The model was compared to experiments and used in several recent publications from the Courant Institute (Newbolt et al. 2019, 2022, 2024).

      Citations:  

      Wu, T. Y. T. (1961). Swimming of a waving plateJournal of Fluid Mechanics10(3), 321-344. DOI: https://doi.org/10.1017/S0022112061000949

      Newbolt, J. W., Lewis, N., Bleu, M., Wu, J., Mavroyiakoumou, C., Ramananarivo, S., & Ristroph, L. (2024). Flow interactions lead to self-organized flight formations disrupted by self-amplifying wavesNature Communications15(1), 3462. DOI: https://doi.org/10.1038/s41467-024-47525-9

      Newbolt, J. W., Zhang, J., & Ristroph, L. (2022). Lateral flow interactions enhance speed and stabilize formations of flapping swimmersPhysical Review Fluids7(6), L061101. DOI: https://doi.org/10.1103/PhysRevFluids.7.L061101

      Newbolt, J. W., Zhang, J., & Ristroph, L. (2019). Flow interactions between uncoordinated flapping swimmers give rise to group cohesionProceedings of the National Academy of Sciences116(7), 2419-2424.  DOI: https://doi.org/10.1073/pnas.1816098116

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on such a comprehensive and well-thought-out study; I truly enjoyed reading it and have only a couple of suggestions that I believe will help further strengthen the paper. I am including a bunch of references here that are very familiar to me without the expectation of you to include them all, just to point at areas that I feel you might consider useful.

      We thank the referee again for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      First, I believe that some more rationale is needed to justify the chosen modeling framework. I am fully aware of how difficult is to run these simulations, but I see some critical assumptions that need to be at least spelled out for the reader to appreciate the limitations of the study: (1) Constraining the cross-stream coordinate (a stability analysis should include perturbations on the cross-stream coordinate as well, see, for example, https://doi.org/10.1017/flo.2023.25 -- I know this is much simpler as it discards any vortex shedding) and (2) Assuming equal frequency and amplitude (there are studies showing variation of tail beat frequency in animals depending on their position in the school, see, for example, https://doi.org/10.1007/s00265-014-1834-4).

      Thank you for these suggestions. These are indeed important and interesting points to discuss in the manuscript. See response above regarding point 1. Regarding point 2, this is of course important and will be pursued in future extensions of this work. We edited the intro and discussion of the main text to explain this.

      In the paper “Stability of schooling patterns of a fish pair swimming against a flow”, The authors considered a pair of swimmers swimming in a channel. They analyzed stability of the system and find multiple equilibria of the system, including inline and staggered formation, and a special formation of perpendicular to the wall. Studying fish school in confined domain and analyzing their stability is very interesting. We added citation to this paper in the discussion section at the end of page 10.

      In the paper “Fish swimming in schools save energy regardless of their spatial position”, the authors measured the reduction in power of fish by measuring tail beat frequency and oxygen consumption and compared them to measurements in solitary fish. They found that in a school of fish, individuals always save power comparing to swimming alone.  However, there is one important caveat in this study: they considered a larger school of fish and expressed the results in terms of pairwise configurations (see schematics we draw below). This is misleading because it may suggest that formations with only two fish provide benefits each other, while in fact, the data is obtained from a larger school with many neighbors. They only consider a fish’s relationship to its nearest neighbor. But in a large school, other neighbors will also have influence on their energy consumption.  In the schematics below, we emphasized on several focal fishes, marking them as red, green, and blue. We also marked their nearest neighbors using the same color, but lighter. The nearest neighbors are what the authors are considering to show its neighbor relationship. For example, a problematic one is the red fish, for which its nearest neighbor is behind it, but indeed, its power saving may come from the other neighbors, which are around or ahead it.

      Author response image 3.

      Second, I would like to see more biology context with respect to limitations that are inherent to a purely mechanical model, including, neglecting vision that we know plays a synergistic role in determining schooling patterns. For example, a recent study https://doi.org/10.1016/j.beproc.2022.104767 has presented experiments on fish swimming in the dark and in bright conditions, showing that it is unlikely that hydrodynamics alone could explain typically observed swimming patterns in the literature.

      Thank you for this suggestion and for sharing us with the paper “Collective response of fish to combined manipulations of illumination and flow”. This is a great study, and we are sorry to have missed it.

      In this paper, the authors found that when having illumination, fish swim more cohesively, which is in consistent with another paper we already cited “The sensory basis of schooling by intermittent swimming in the rummy-nose tetra (Hemigrammus rhodostomus)”. Another important conclusion in this paper is that when having brighter illumination and with flow, fish school spend more time side by side. This connects well to the conclusion in another paper we cited “Simple phalanx pattern leads to energy saving in cohesive fish schooling,” where at lower flow speed in a water channel, fish tended to form a dynamic school while at higher flow speed, they organized in a side-by-side/ phalanx configuration. This conclusion is consistent with our study that in side-by-side formation, fish share power saving.

      Importantly, it is well known that both vision and flow sensing play important roles in fish schooling. This study aimed to merely explore what is possible through passive hydrodynamic interactions, without visual and flow sensing and response. We clarify this in the revised version of the manuscript.

      Third, I am not too convinced about the flow agreement metric, which only accounts for linear interactions between the foils. More sophisticated approaches could be utilized as the one proposed here https://doi.org/10.1017/jfm.2018.369, based on a truly model-agnostic view of the interaction - therein, the authors show non-reciprocal (in strength and time-scale) coupling between two in-line flapping foils using information theory. I also would like to mention this older paper https://doi.org/10.1098/rsif.2012.0084, where an equivalent argument about the positioning of a trailing fish with respect to a leading robotic fish is made from experimental observations.

      Thank you for these remarks and for sharing these two interesting papers.

      The flow agreement metric is not specific to two fish, as we show in Fig. 6 of the manuscript. We edited the manuscript and SI to better explain the motivation and implementation of the flow agreement parameter. We edited the main text, see revisions on page 7, and added a new section call “diagnostic tools.”.

      In the paper “An information-theoretic approach to study fluid–structure interactions”, the authors calculate the transfer entropy between two oscillating airfoils when they are hydrodynamically coupled.  This is an interesting study! We will apply this approach to analyzing larger schools in the future. We cited this paper in the introduction.

      In the paper “Fish and robots swimming together: attraction towards the robot demands biomimetic locomotion”, the authors found that fish will swim behind an artificial fish robot, especially when the fish robot is beating its tail instead of static. At specific conditions, the fish hold station behind the robot, which may be due to the hydrodynamic advantage obtained by swimming in the robot’s wake. DPIV resolved the wake behind a static/ beating fish robot, but did not visualize the flow field when the fish is there. This study is similar to a paper we already cited “In-line swimming dynamics revealed by fish interacting with a robotic mechanism”, in which, they considered fish-foil interaction. In the revised manuscript, we cite both papers.

      For the reviewer’s comments about flow agreement only accounts for linear interactions between the foils, we want to explain more to clarify this. The flow agreement parameter is a nonlinear metric, which considered the interaction between a virtual swimmer and an arbitrary unsteady flow field. Although the metric is a linear function of swimmer’s speed, it is indeed a nonlinear function of spacing and phase, which are the quantities we care about. Moreover, the flow field can by generated by either experiment or CFD simulation, and behind one or more swimmers. It is true that it is a one way coupled system since the virtual swimmer does not perturb the flow field.

      Again, this is great work and I hope these suggestions are of help.

      Thank you again! We are delighted to receive such a positive and constructive feedback.

      Reviewer #2 (Recommendations For The Authors):

      (1) About Figure 1: Panel C should be made to match between CFD and VS with regard to the swimmer positions. Also, if the general goal of the figure is to compare CFD and VS, then how about showing a difference map of the velocity fields as a third column of panels across A-D?

      Thank you for pointing this out. Figure 1 C is updated accordingly.

      The general goal is to show the CFD and VS simulations produce qualitatively similar results. Some quantities are not the same across models, e.g. the swimming speed of swimmers are different, but the scaled distance is the same.

      (2) Figure 3: In A, it would be nice to keep the y-axis the same across all plots, which would aid quick visual comparison. In B, the legend labels for CFD and VS should be filled in with color so that the reader can more easily connect to the markers in the plot.

      Thank you for pointing this out, we’ve updated figure 3 and 6.

      (3) Figures 4, 9, and Supplementary Figures too: As mentioned previously, the agreement parameter plots are saturated in the color map, possibly obscuring more detailed information.

      Thank you for pointing this out. The goal is to show that there is a large region with positive flow agreement parameter.

      We picked up the flow agreement behind a single swimmer in VS simulation (Fig.4B) and added the counter lines to it (represents 0.25 and 0.5).  Not many details are hidden by the saturated colormap.

      Author response image 4.

      We also updated Fig 4 and Fig 9 accordingly.

      (4) Figure 6: Is this CFD or VS? Why show one or the other and not both? In B, it seems that there are only savings available and no energetically costly positions. This seems odd. In C, it seems the absolute value on dF/dd is suppressing some important information about stability - the sign of this seems important. In E, the color bar seems to be reflected from what is standard, i.e. 0 on the left and 100 on the right, as in F.

      Thank you for asking. Fig. 6 is based only on VS simulations. There are hundreds of simulations in this figure, we are not running CFD simulations to save computational effort. Representative CFD simulations are shown in Figure 1,2,3, for comparison. We added a sentence in the figure caption for clarification.

      In C, since  is always negative for emergent formations (only stable equilibria can appear during forward time simulation), we are showing its absolute value for comparison.

      In E, we are flipping this because larger flow agreement parameter corresponds to more power saving, in the other word, negative changes in COT.

      (5) Fig. 8: For cases such as in D that have >100% power savings, does this mean that the swimmer has work done by the flow? How to interpret this physically for a flapping foil and biologically for a fish?

      Yes, it means the hydrofoil/fish gets a free ride, and even able to harvest energy from the incoming flow. Actually, similar phenomenon has been reported in the biology and engineering literature. For example, Liao et al. 2003, Beal et al. 2006 found that live or dead fish can harvest energy from incoming vortical flow by modulating their body curvature.

      In engineering, Chen et al. 2018, Ribeiro et al. 2021 have found that the following airfoil in a tandem/ inline formation can harvest energy from the wake of leading swimmer in both simulation and experiemnts.

      Citations:  

      Liao, J. C., Beal, D. N., Lauder, G. V., & Triantafyllou, M. S. (2003). Fish exploiting vortices decrease muscle activityScience302(5650), 1566-1569. DOI: https://doi.org/10.1126/science.1088295

      Beal, D. N., Hover, F. S., Triantafyllou, M. S., Liao, J. C., & Lauder, G. V. (2006). Passive propulsion in vortex wakesJournal of fluid mechanics549, 385-402. DOI: https://doi.org/10.1017/S0022112005007925

      Chen, Y., Nan, J., & Wu, J. (2018). Wake effect on a semi-active flapping foil based energy harvester by a rotating foilComputers & Fluids160, 51-63. DOI: https://doi.org/10.1016/j.compfluid.2017.10.024

      Ribeiro, B. L. R., Su, Y., Guillaumin, Q., Breuer, K. S., & Franck, J. A. (2021). Wake-foil interactions and energy harvesting efficiency in tandem oscillating foilsPhysical Review Fluids6(7), 074703. DOI: https://doi.org/10.1103/PhysRevFluids.6.074703

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer 1 summarized that: In this revised version of the manuscript, the authors have made important modifications in the text, inserted new data analyses, and incorporated additional references, as recommended by the reviewers. These modifications have significantly improved the quality of the manuscript.

      We are grateful for the reviewer's positive recognition of our revisions.

      Reviewer 2 noted that:

      (1) The authors do not show if the PVT mediates dPAG to BLA communication with any functional behavioral assay.

      We appreciate the reviewer’s suggestion to include a functional assay to investigate the role of the PVT in mediating communication between the dPAG and BLA. Our primary objective was to confirm the upstream role of the dPAG in processing and relaying naturalistic predatory threat information to the BLA, thereby broadening our current understanding of the dPAG-BLA relationship based on Pavlovian fear conditioning paradigms.

      Given previous anatomical findings indicating the absence of direct monosynaptic projections from the dPAG to the BLA (Cameron et al. 1995, McNally, Johansen, and Blair 2011, Vianna and Brandao 2003), we employed both anterograde and retrograde tracers, supplemented by c-Fos expression analysis following predatory threats, to explore possible routes through which threat signals may be conveyed from the dPAG to the BLA. Our findings indicated significant activity within the midline thalamic regions, particularly the PVT as a mediator of dPAG-BLA interactions, corroborating the possibility of dPAGàBLA information flow.

      Investigating the PVT's functional role appropriately would require single-unit recordings, correlation analysis of PVT neuronal responses with dPAG and BLA neuronal responses, and pathway-specific causal techniques, involving other midline thalamic regions for controls. This comprehensive study would represent an independent study.

      In response to previous feedback, we have carefully revised our manuscript to moderate the emphasis on the PVT's role. Both the Abstract, Results, and Discussion refer more broadly to "midline thalamic regions" and “The midline thalamus” (subheading) rather than specifically to the PVT. In the Introduction, we mention that the PVT "may be part of a network that conveys predatory threat information from the dPAG to the BLA." Our conclusions about the functional interaction between the dPAG and BLA, which broaden the view of Pavlovian fear conditioning, are not contingent on confirming a specific intermediary role for the PVT.

      (2) The author also do not thoroughly characterize the activity of BLA cells during the predatory assay.

      Our previous studies have extensively detailed BLA cell firing characteristics, including their responsiveness to food and/or a robot predator during the predatory assay (Kim et al. 2018, Kong et al. 2021), and compared these findings to other predator studies (Amir et al. 2019, Amir et al. 2015). In the current study, out of 85 BLA cells, 3 were food-specific and 4 responded to both the pellet and the robot, with none of these 7 cells responding to dPAG stimulation.

      Given our earlier findings of the immediate responses of BLA neurons to robot activation, we specifically examined whether robot-responsive BLA neurons receive signals from the dPAG. For this analysis, we excluded all food-related cells (pellet cells and BOTH cells) and focused on the time window immediately after robot activation (within 500 ms after robot onset). This approach enabled us to avoid potential confounds from residual effects of robot-induced immediate BLA responses during the animals’ flight and nest entry behaviors.

      Furthermore, as previously described, the robot is programmed to move forward a fixed distance and then return, repeatedly triggering foraging behavior. This setup facilitates the analysis of neural changes during food approach and predator avoidance conflicts. However, animals quickly adapt to the robot, reducing freezing and stretch-attend behaviors, making time-stamped analysis of these behaviors unfeasible.

      We would like to highlight that the present study explicitly focused on demonstrating whether BLA neurons that responded to intrinsic dPAG optogenetic stimulation also responded to extrinsic predatory robot activation, and compared their firing characteristics to those BLA neurons that did not respond to dPAG stimulation (Figure 3). This targeted analysis provides insights into the responsiveness of BLA neurons to both intrinsic and extrinsic stimuli, furthering our understanding of the dPAG-BLA interaction in the context of predatory threats.

      Reviewer 3 also raised no concerns and stated that: The series of experiments provide a compelling case for supporting their conclusions. The study brings important concepts revealing dynamics of fear-related circuits particularly attractive to a broad audience, from basic scientists interested in neural circuits to psychiatrists.

      We sincerely thank the reviewer for the positive feedback on our revisions.

      Recommendations for the Authors

      Reviewer 1: There are a few minor concerns that the authors may want to fix:

      (1) Point 5) The sentence: "The complexity of targeting the dPAG, which includes its dorsomedial, dorsolateral, lateral, and ventrolateral subdivisions" is hard to follow because the ventrolateral subdivision is not part of the dPAG. The authors may want to say specific subregions of the PAG instead. It is also unclear why transgenic animals would be needed for this projection-defined manipulations. The combination of retrograde Cre-recombinase virus with inhibitory opsin or chemogenetic approach may be sufficient.

      We appreciate the reviewer’s insightful feedback regarding our description of the dPAG and the use of transgenic mice in future studies. As suggested, we have corrected the manuscript to exclude the 'ventrolateral' subdivision from the dPAG description, now accurately aligning with pioneering studies (Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993) that designated dPAG as including the dorsomedial (dmPAG), dorsolateral (dlPAG) and lateral (lPAG) regions, as cited in our revised manuscript.

      We acknowledge the reviewer’s helpful suggestion regarding the use of retrograde Cre-recombinase virus with inhibitory opsins or chemogenetic approaches as viable alternatives. These methods have been incorporated into our discussion (pages 14-15): “While our findings demonstrate that opto-stimulation of the dPAG is sufficient to trigger both fleeing behavior and increased BLA activity, we have not established that the dPAG-PVT circuit is necessary for the BLA’s response to predatory threats. To establish causality and interregional relationships, future studies should employ methods such as pathway-specific optogenetic inhibition (using retrograde Cre-recombinase virus with inhibitory opsins; Lavoie and Liu 2020, Li et al. 2016, Senn et al. 2014) or chemogenetics (Boender et al. 2014, Roth 2016) in conjunction with single unit recordings to fully characterize the dPAG-PVT-BLA circuitry’s (as opposed to other midline thalamic regions for controls) role in processing predatory threat-induced escape behavior. If inactivating the dPAG-PVT circuits reduces the BLA's response to threats, this would highlight the central role of the dPAG-PVT pathway in this defense mechanism. Conversely, if the BLA's response remains unchanged despite dPAG-PVT inactivation, it could suggest the existence of multiple pathways for antipredatory defenses.”

      This revision addresses the critique by clarifying the anatomical description of the dPAG and emphasizing the feasibility of using targeted viral approaches without the necessity for transgenic animals.

      (2) Point 6e) The authors mentioned that "pellet retrieval" was indicated by the animal entering a designated zone 19 cm from the pellet, driven by hunger. Entering the area 19cm of distance should be labeled as food approaching rather then food retrieval because in many occasions the animals may be some seconds away of grabbing the pellet.

      We agree and incorporate the change (pg. 22).

      (3) Point 11) We would strongly recommend the authors to replace the terminology "looming" by "approaching" to avoid confusion with several previous studies looking at defensive behaviors in responses to looming induced by the shadow of an object moving closer to the eyes.

      Done.

      (4) Point 17) The authors mentioned that "A total of three rats were utilized for the robot testing experiments depicted in Fig. 2 G-J." However, the figure indicates a total of 9 ChR2 and 4 controls.

      We apologize for the confusion in our previous author responses. To examine the optical stimulation effects on behavior in Fig. 2G-J, we used a total of 9 ChR2 and 4 EYFP rats. The experimental sequence is detailed in the previously revised manuscript (pg. 20): “For optical stimulation and behavioral experiments, the procedure included 3 baseline trials with the pellet placed 75 cm away, followed by 3 dPAG stimulation trials with the pellet locations sequentially set at 75 cm, 50 cm, and 25 cm. During each approach to the pellet, rats received 473-nm light stimulation (1-2 s, 20-Hz, 10-ms width, 1-3 mW) through a laser (Opto Engine LLC) and a pulse generator (Master-8; A.M.P.I.). Additional testing to examine the functional response curves was conducted over multiple days, with incremental adjustments to the stimulation parameters (intensity, frequency, duration) after confirming that normal baseline foraging behavior was maintained. For these tests, one parameter was adjusted incrementally while the others were held constant (intensity curve at 20 Hz, 2 s; frequency curve at 3 mW, 2 s; duration curve at 20 Hz, 3 mW). If the rat failed to procure the pellet within 3 min, the gate was closed, and the trial was concluded.”

      This clarification ensures that the actual number of animals used is accurately reflected and aligns with the figure data, addressing the reviewer's concern.

      Reviewer 2: The authors made important changes in the text to address study limitations, including citations requested by the Reviewers and additional discussions about how this work fits into the existing literature. These changes have strengthened the manuscript.

      (1) However, the authors did not perform new experiments to address any of the issues raised in the previous round of reviews. For example, they did not make optogenetic manipulations of the pathway including the PVT, and did not add any loss of function experiments. The justification that these experiments are better suited for future reports using mice is not convincing, because hundreds of papers performing these types of circuit dissection assays have been performed in rats.

      We appreciate the reviewer's comments regarding the experimental scope of our study. Our study’s primary objective was to explore the dPAG’s upstream functional role in processing and conveying naturalistic predatory threat information to the BLA, extending our current understanding of the dPAG-BLA relationship based on Pavlovian fear conditioning paradigms. We believe that our findings effectively address this goal.

      Our use of anterograde and retrograde tracers, supplemented by c-Fos expression analysis in response to predatory threats, was primarily conducted to verify the possibility of the dPAGàBLA information flow during predator encounters. This involved exploring potential routes through which threat signals might be conveyed from the dPAG to the BLA, given the lack of direct monosynaptic projections from the dPAG to BLA neurons (Cameron et al. 1995, McNally, Johansen, and Blair 2011, Vianna and Brandao 2003). This methodology helped us identify a potential structure, PVT, for more in-depth future studies. A thorough examination of the PVT's role would require single-unit recordings and causal techniques, incorporating other midline thalamic regions as controls, representing a significant and separate study on its own.

      In response to prior feedback, we have carefully revised our manuscript to generally address the role of "midline thalamic regions" rather than focusing specifically on the PVT. We wish to emphasize that our findings, which illustrate unique functional interactions between the dPAG and BLA in response to a predatory imminence, remain compelling and informative even without definitive evidence of the PVT’s involvement.

      Reviewer 3: In the revised version of the manuscript, the authors addressed adequately all the concerns raised by the reviewers. 

      We thank the reviewer for the thoughtful feedback on the earlier version of our manuscript and for reexamining the revisions we have made.

      References

      Amir, A., P. Kyriazi, S. C. Lee, D. B. Headley, and D. Pare. 2019. "Basolateral amygdala neurons are activated during threat expectation." J Neurophysiol 121 (5):1761-1777.

      Amir, A., S. C. Lee, D. B. Headley, M. M. Herzallah, and D. Pare. 2015. "Amygdala Signaling during Foraging in a Hazardous Environment." J Neurosci 35 (38):12994-3005.

      Bandler, R., P. Carrive, and S. P. Zhang. 1991. "Integration of somatic and autonomic reactions within the midbrain periaqueductal grey: viscerotopic, somatotopic and functional organization." Prog Brain Res 87:269-305.

      Bandler, R., and K. A. Keay. 1996. "Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression." Prog Brain Res 107:285-300.

      Boender, A. J., J. W. de Jong, L. Boekhoudt, M. C. Luijendijk, G. van der Plasse, and R. A. Adan. 2014. "Combined use of the canine adenovirus-2 and DREADD-technology to activate specific neural pathways in vivo." PLoS One 9 (4):e95392.

      Cameron, A. A., I. A. Khan, K. N. Westlund, and W. D. Willis. 1995. "The efferent projections of the periaqueductal gray in the rat: a Phaseolus vulgaris-leucoagglutinin study. II. Descending projections." J Comp Neurol 351 (4):585-601.

      Carrive, P. 1993. "The periaqueductal gray and defensive behavior: functional representation and neuronal organization." Behav Brain Res 58 (1-2):27-47.

      Kim, E. J., M. S. Kong, S. G. Park, S. J. Y. Mizumori, J. Cho, and J. J. Kim. 2018. "Dynamic coding of predatory information between the prelimbic cortex and lateral amygdala in foraging rats." Sci Adv 4 (4):eaar7328.

      Kong, M. S., E. J. Kim, S. Park, L. S. Zweifel, Y. Huh, J. Cho, and J. J. Kim. 2021. "'Fearful-place' coding in the amygdala-hippocampal network." Elife 10.

      Lavoie, A., and B. H. Liu. 2020. "Canine Adenovirus 2: A Natural Choice for Brain Circuit Dissection." Front Mol Neurosci 13:9.

      Li, Y., L. Hickey, R. Perrins, E. Werlen, A. A. Patel, S. Hirschberg, M. W. Jones, S. Salinas, E. J. Kremer, and A. E. Pickering. 2016. "Retrograde optogenetic characterization of the pontospinal module of the locus coeruleus with a canine adenoviral vector." Brain Res 1641 (Pt B):274-90.

      McNally, G. P., J. P. Johansen, and H. T. Blair. 2011. "Placing prediction into the fear circuit."  Trends Neurosci 34 (6):283-92.

      Roth, B. L. 2016. "DREADDs for Neuroscientists." Neuron 89 (4):683-94.

      Senn, V., S. B. Wolff, C. Herry, F. Grenier, I. Ehrlich, J. Grundemann, J. P. Fadok, C. Muller, J. J. Letzkus, and A. Luthi. 2014. "Long-range connectivity defines behavioral specificity of amygdala neurons." Neuron 81 (2):428-37.

      Vianna, D. M., and M. L. Brandao. 2003. "Anatomical connections of the periaqueductal gray: specific neural substrates for different kinds of fear." Braz J Med Biol Res 36 (5):557-66.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The author presents the discovery and characterization of CAPSL as a potential gene linked to Familial Exudative Vitreoretinopathy (FEVR), identifying one nonsense and one missense mutation within CAPSL in two distinct patient families afflicted by FEVR. Cell transfection assays suggest that the missense mutation adversely affects protein levels when overexpressed in cell cultures. Furthermore, conditionally knocking out CAPSL in vascular endothelial cells leads to compromised vascular development. The suppression of CAPSL in human retinal microvascular endothelial cells results in hindered tube formation, a decrease in cell proliferation, and disrupted cell polarity. Additionally, transcriptomic and proteomic profiling of these cells indicates alterations in the MYC pathway. 

      Strengths: 

      The study is nicely designed with a combination of in vivo and in vitro approaches, and the experimental results are good quality. 

      We thank the reviewer for the conclusion and positive comments.

      Weaknesses: 

      My reservations lie with the main assertion that CAPSL is associated with FEVR, as the genetic evidence from human studies appears relatively weak. Further careful examination of human genetics evidence in both patient cohorts and the general population will help to clarify. In light of human genetics, more caution needs to be exercised when interpreting results from mice and cell models and how is it related to the human patient phenotype. 

      We thank the reviewer for careful reading and constructive suggestion. we added several experiments to address the concern of reviewer are as follows:

      (1) The pLI score of LOF allele of CAPSL is based of general population, among which Europeans account for ~77% and East Asians make up less than 3%. Since the FEVR families in this article all come from China, the pLI score may not be accurate. Of course, we will continue to collect FEVR pedigrees.

      (2) We evaluated the phenotype of Capsl heterozygous mice at P5, and the results showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. A similar example is LRP5 mutations associated with FEVR. Heterozygous mutations in LRP5 were reported in FEVR patients in multiple populations (PMID: 16929062, 33302760, 27486893, 35918671, 36411543). However, heterozygous Lrp5 knockout mice exhibited no visible angiogenic phenotype (PMID: 18263894). Corresponding description was added in the manuscript at page 6.

      (3) We further assessed the angiogenic phenotype when angiogenesis almost complete at P21, and the resulted revealed no difference observed between Ctrl and CapsliECKO/iECKO mice (Fig.S5). And corresponding description was added in the manuscript at page 7.

      (4) We evaluated the expression of MYC downstream genes in vivo using lung tissue form P35 Ctrl and _Capsl_iECKO/iECKO mice (Fig.S8). Consistent with the results from in vitro HRECs, _Capsl_iECKO/iECKO mice showed downregulated expression of MYC targets. And corresponding description was added in the manuscript at page 11.

      Reviewer #2 (Public Review): 

      Summary: 

      This work identifies two variants in CAPSL in two-generation familial exudative vitreoretinopathy (FEVR) pedigrees, and using a knockout mouse model, they link CAPSL to retinal vascular development and endothelial proliferation. Together, these findings suggest that the identified variants may be causative and that CAPSL is a new FEVR-associated gene. 

      Strengths: 

      The authors' data provides compelling evidence that loss of the poorly understood protein CAPSL can lead to reduced endothelial proliferation in mouse retina and suppression of MYC signaling in vitro, consistent with the disease seen in FEVR patients. The study is important, providing new potential targets and mechanisms for this poorly understood disease. The paper is clearly written, and the data generally support the author's hypotheses. 

      We thank the reviewer for the conclusion and positive comments.

      Weaknesses: 

      (1) Both pedigrees described appear to suggest that heterozygosity is sufficient to cause disease, but authors have not explored the phenotype of Capsl heterozygous mice. Do these animals have reduced angiogenesis similar to KOs? Furthermore, while the p.R30X variant protein does not appear to be expressed in vitro, a substantial amount of p.L83F was detectable by western blot and appeared to be at the normal molecular weight. Given that the full knockout mouse phenotype is comparatively mild, it is unclear whether this modest reduction in protein expression would be sufficient to cause FEVR - especially as the affected individuals still have one healthy copy of the gene. Additional studies are needed to determine if these variants alter protein trafficking or localization in addition to expression, and if they can act in a dominant negative fashion. 

      We thank the reviewer for the suggestion. We evaluated the phenotype of Capsl heterozygous mice at P5 (Fig.S4), and the results showed no overt difference in angiogenesis compared with littermate control mice.

      We transfected CAPSL wild-type plasmid, p.R30X mutant plasmid and p.L83F mutant plasmid into 293T cells to assess the intracellular localization change of CAPSL mutant proteins (Fig.S1). The result showed that the point mutation did not affect the localization of the mutated protein, and corresponding description was added in the manuscript at page 5.

      (2) The manuscript nicely shows that loss of CAPSL leads to suppressed MYC signaling in vitro. However, given that endothelial MYC is regulated by numerous pathways and proteins, including FOXO1, VEGFR2, ERK, and Notch, and reduced MYC signaling is generally associated with reduced endothelial proliferation, this finding provides little insight into the mechanism of CAPSL in regulating endothelial proliferation. It would be helpful to explore the status of these other pathways in knockdown cells but as the authors provide only GSEA results and not the underlying data behind their RNA seq results, it is difficult for the reader to understand the full phenotype. Volcano plots or similar representations of the underlying expression data in Figures 6 and 7 as well as supplemental datasets showing the differentially regulated genes should be included. In addition, while the paper beautifully characterizes the delayed retinal angiogenesis phenotype in CAPSL knockout mice, the authors do not return to that model to confirm their in vitro findings. 

      We thank the reviewer for the suggestion. Although endothelial MYC can be regulated by FOXO1, VEGFR2, ERK, and Notch signaling pathway, these pathways are not enriched in the RNA seq data of CAPSL-depleted HRECs. This suggests that the down regulated MYC targets may not be influenced by the signaling pathway mentioned above. RNA-seq raw data have been uploaded to the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa/browse/HRA010305) and proteomic profiling raw data have been uploaded to the Genome Sequence Archive (https://www.ebi.ac.uk/pride/archive), and the assigned accession number was PXD051696. Corresponding description was added in the manuscript at page 20-21. The datasets represent the differentially regulated genes in Figure 6 and 7 were listed at Dataset S1 and S2.

      (3) In Figure S2D, the result of this vascular leak experiment is unconvincing as no dye can be seen in the vessels. What are the kinetics for biocytin tracers to enter the bloodstream after IP injection? Why did the authors choose the IP instead of the IV route for this experiment? Differences in the uptake of the eye after IP injection could confound the results, especially in the context of a model with vascular dysfunction as here. 

      We thank the reviewer for suggestion. In Figure S2D (now Fig.S6D), we used a non-representative image to show vascular leakage. We replaced the images with more representative ones. We are sorry that we are not clear about the kinetics for biocytin tracers to enter the bloodstream after IP injection. Since the experiment was carried out on mice at P5, it is not feasible to do IV injection in P5 neonatal mice. We followed the methods described in the previous study involving mice of same age (PMID:35361685).

      (4) In Figure 5, it is unclear how filipodia and tip cells were identified and selected for quantification. The panels do not include nuclear or tip cell-specific markers that would allow quantification of individual tip cells, and in Figure 5C it appears that some filipodia are not highlighted in the mutant panel. 

      We thank the reviewer for the comments. In Figure 5, we used HRECs to examine the cell proliferation, migration and polarity in vitro, and therefore there is no distinction between tip cells and stalk cells. The quantification of filopodia/lamellipodia was performed as previous studies (PMID: 30783090, PMID: 28805663). In briefly, wound scratch was performed on confluent layers of transfected HRECs, and 9 hours after initiating cell migration by scratch, cells were fixed and stained with phalloidin. Cells at the edge of wound were considered as leader cells and quantified for number of filopodia/lamellipodia.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript by Liu et al. presents a case that CAPSL mutations are a cause of familial exudative vitreoretinopathy (FEVR). Attention was initially focused on the CAPSL gene from whole exome sequence analysis of two small families. The follow-up analyses included studies in which CAPSL was manipulated in endothelial cells of mice and multiple iterations of molecular and cellular analyses. Together, the data show that CAPSL influences endothelial cell proliferation and migration. Molecularly, transcriptomic and proteomic analyses suggest that CAPSL influences many genes/proteins that are also downstream targets of MYC and may be important to the mechanisms. 

      Strengths: 

      This multi-pronged approach found a previously unknown function for CAPSLs in endothelial cells and pointed at MYC pathways as high-quality candidates in the mechanism. 

      Weaknesses: 

      Two issues shape the overall impact for me. First, the unreported population frequency of the variants in the manuscript makes it unclear if CAPSL should be considered an interesting candidate possibly contributing to FEVR, or possibly a cause. Second, it is unclear if the identified variants act dominantly, as indicated in the pedigrees. The studies in mice utilized homozygotes for an endothelial cell-specific knockout, leaving uncertainty about what phenotypes might be observed if mice heterozygous for a ubiquitous knockout had instead been studied. 

      In my opinion, the following scientific issues are specific weaknesses that should be addressed: 

      (1) Please state in the manuscript the number of FEVR families that were studied by WES. Please also describe if the families had been selected for the absence of known mutations, and/or what percentage lack known pathogenic variants. 

      We thank the reviewer for thoughtful comments. 120 FEVR families were studied by WES and we added corresponding description in the manuscript at page 4.

      (2) A better clinical description of family 3104 would enhance the manuscript, especially the father. It is unclear what "manifested with FEVR symptoms, according to the medical records" means. Was the father diagnosed with FEVR? If the father has some iteration of a mild case, please describe it in more detail. If the lack of clinical images in the figure is indicative of a lack of medical documentation, please note this in the manuscript. 

      We thank the reviewer for thoughtful comments. The father of family 3104 has also been identified as a carrier of this heterozygous variant, manifested with FEVR symptoms, according to the medical records. Nevertheless, clinical examination images are presently unavailable. We added corresponding description in the manuscript at page 5.

      (3) The TGA stop codon can in some instances also influence splicing (PMID: 38012313). Please add a bioinformatic assessment of splicing prediction to the assays and report its output in the manuscript. 

      We thank the reviewer for thoughtful comments. We predicted the splicing of c.88C>T variant of CAPSL using MaxEntScan (http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html) and SpliceTool (https://rddc.tsinghua-gd.org/ai) (Fig.S2). MaxEntScan and SpliceTool were used to predict the impact of TGA stop codon of c.88C>T variant on the formation of a cryptic donor splice site.

      (4) More details regarding utilizing a "loxp-flanked allele of CAPSL" are needed. Is this an existing allele, if so, what is the allele and citation? If new (as suggested by S1), the newly generated CAPSL mutant mouse strain needs to be entered into the MGI database and assigned an official allele name - which should then be utilized in the manuscript and who generated the strain (presumably a core or company?) must be described. 

      We added detailed description of Capsl flxoed allele to Method section on page 14-15: “Capslloxp/+ model was generated using the CRISPR/Cas9 nickase technique by Viewsolid Biotechology (Beijing, China) in C57BL/6J background and named Capslem1zxj. The genomic RNA (gRNA) sequence was as follows: Capsl-L gRNA: 5’-CTATCCCAA TTGTGCTCCTGG-3’; Capsl-R gRNA: 5’-TGGGACTCATGGTTCTAGAGG-3’. ”

      (5) The statement in the methods "All mice used in the study were on a C57BL/6J genetic background," should be better defined. Was the new allele generated on a pure C57BL/6J genetic background, or bred to be some level of congenic? If congenic, to what generation? If unknown, please either test and report the homogeneity of the background, or consult with nomenclature experts (such as available through MGI) to adopt the appropriate F?+NX type designation. This also pertains to the Pdgfb-iCreER mice, which reference 43 describes as having been generated in an F2 population of C57BL/6 X CBA and did not designate the sub-strain of C57BL/6 mice. It is important because one of the explanations for missing heritability in FEVR may be a high level of dependence on genetic background. From the information in the current description, it is also not inherently obvious that the mice studied did not harbor confounding mutations such as rd1 or rd8. 

      We thank the reviewer for suggestion. We added the following description to “Mouse model and genotyping” method section on page 14. “Capslloxp/+ model was generated using the CRISPR/Cas9 nickase technique by Viewsolid Biotechology (Beijing, China) in C57BL/6J background and named Capslem1zxj. The genomic RNA (gRNA) sequence was as follows: Capsl-L gRNA: 5’-CTATCCCAA TTGTGCTCCTGG-3’; Capsl-R gRNA: 5’-TGGGACTCATGGTTCTAGAGG-3’. Pdgfb-iCreER[43] transgenic mice on a mixed background of C57BL/6 and CBA was obtainted from Dr. Marcus Fruttiger and backcrossed to background for 6 generations. Capslloxp/+ mice were bred with Pdgfb-iCreER[43] transgenic mice to generate Capslloxp/loxp, Pdgfb-iCreER mice.” Sanger sequencing was performed on experimental mice to identify whether they harbor confounding mutations such as Pde6b or Crb1. The results showed the mice did not harbor confounding mutations (Fig.S9) and corresponding description was added in the manuscript at page 15.

      (6) In my opinion, more experimental detail is needed regarding Figures 2 and 3. How many fields, of how many retinas and mice were analyzed in Figure 2? How many mice were assessed in Figure 3? 

      We thank the reviewer for thoughtful comments. We have already presented the detailed information in the manuscript, please refer to the “Methods-Quantification of retinal parameters” section for experimental details.

      (7) I suggest adding into the methods whether P-values were corrected for multiple tests. 

      We thank the reviewer for suggestion. Actually, the statistical analysis was performed using unpaired Student’s t-test for comparison between two groups or one-way ANOVA followed by Dunnett multiple comparison test for comparison of multiple groups. The above description was added to “Methods-Image acquisition and statistical analysis” section to make it clear.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors): 

      In summary, the following concerns should addressing reviewers' concerns as outlined below could bolster the evidence from "solid" to "convincing" and further strengthen the study's impact. 

      (1) Analysis of the phenotype in CAPSLheterozygous mice, as highlighted by all 3 reviews. 

      We thank the editor for thoughtful comments. The phenotype analysis of Capsl heterozygous mice was added to Fig.S4, with the corresponding description provided at page 6.

      (2) Analysis of Capsl KO mice to determine if the pathways identified in vitro are modified (as suggested by reviewers 1 & 2). 

      We thank the editor for suggestion. In Fig.S7, RT-qPCR was performed on lung tissues from Capsl Ctrl and KO mice to validate the expression of MYC targets in vivo. And the result indicated that the downstream targets of MYC signaling were also downregulated in vivo, consistent with the in vitro findings.

      (3) Additional description of the genetic pedigrees and variants to address the points raised by reviewer #3. 

      We thank the editor for suggestion. The father of family 3104 has also been identified as a carrier of this heterozygous variant, manifested with FEVR symptoms, according to the medical records. Nevertheless, clinical examination data are presently unavailable. We added corresponding description in the manuscript page 5.

      (4) Validation of the identified protein variants, especially L83F which appears to be expressed at a near normal level. Are these proteins mislocalized, do the variants to interfere with sites of known or predicted protein-protein interactions, could they act in a dominant-negative fashion by aggregation with co-expressed WT protein etc. Given the comparatively weak genetic data, additional validation is required to establish plausibility of CAPSL as a FEVR gene. 

      We thank the editor for suggestion. As substantial amount of p.L83F was detectable at normal molecular weight, we further investigated whether this variant affects protein localization. Fig.S1, immunocytochemistry results indicated that this variant does not affect the subcellular localization of the protein.

      (5) Improved description of experimental details and statistical analyses as outlined by reviewer #3. 

      We thank the editor for suggestion. The more detailed information about Capsl mice was added in the manuscript at page 14-15. The experimental details regarding Figure 2 and Figure 3 have already presented in the “Methods-Quantification of retina parameters” section in the manuscript at page 19-20. And the statistical analysis was performed using unpaired Student’s t-test for comparison between two groups or one-way ANOVA followed by Dunnett multiple comparison test for comparison of multiple groups. The above description was added to “Methods-Image acquisition and statistical analysis” section at page 21 to make it clear.

      Reviewer #1 (Recommendations For The Authors): 

      My reservations lie with the main assertion that CAPSL is associated with FEVR, as the genetic evidence from human studies appears relatively weak. My concerns are as follows: 

      (1) The molecular characterization of the identified mutations suggests a loss of function (LOF). Notably, in one family, both the father and son exhibit the FEVR phenotype and share the LOF mutation, suggesting a dominant mode of inheritance. However, the prevalence of the LOF allele of CAPSL in the general population is high, and its pLI score is 0, according to the GNOMAD database. This raises doubts about the LOF variant of CAPSL being causative for FEVR. 

      We thank the reviewer for recommendation. The pLI score of LOF allele of CAPSL is based of general population, among which Europeans account for ~77% and East Asians make up less than 3%. Since the FEVR families in this article all come from China, the pLI score may not be accurate. Of course, we will continue to collect FEVR pedigrees and screen for CAPSL mutations.

      (2) In the conditional knockout study, a delay in vascular development is observed in the retina up to P14. What the phenotype looks like in adult mice and whether it replicates the human FEVR phenotype? 

      We thank the reviewer for recommendation. We further assessed the phenotype when angiogenesis almost complete at P21, the resulted showed no difference in Ctrl and CapsliECKO/iECKO mice (Fig.S5). And corresponding description was added in the manuscript at page 7.

      (3) The conditional knockout mice lack both alleles of CAPSL. The phenotype resulting from the knockout of a single allele needs investigation to align with observed human phenotypes and genetic data. 

      We thank the reviewer for recommendation. The phenotype of Capsl heterozygous mice at P5 showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. A similar example is LRP5 mutations associated with FEVR. Heterozygous mutations in LRP5 were reported in FEVR patients in multiple populations. However, heterozygous Lrp5 mice exhibited no visible angiogenic phenotype (PMID: 18263894).

      (4) The MYC pathway has been identified as influenced by CAPSL. Whether MYC downregulation is observed in the mouse model in vivo? 

      We thank the reviewer for recommendation. MYC expression was identified at both mRNA and protein level in Figure S8, and corresponding description was added in the manuscript at page 11.

      Reviewer #2 (Recommendations For The Authors): 

      Minor comments: 

      (1) While authors note that little is known about CAPSL protein function, more introductory detail about the protein (structure, domains intracellular localization etc) and additional discussion on potential mechanisms would aid the reader in interpreting the findings and model.

      We thank the reviewer for recommendation. The subcellular localization of the CAPSL protein is distributed in both the nucleus and cytoplasm (https://www.proteinatlas.org/). The immunochemistry analysis confirmed that CAPSL protein is expressed in both the cell nucleus and cytoplasm (Fig.S1). And corresponding description was added in the manuscript at page 5.

      (2) Pg 7 states that Capsl knockout mainly leads to "...defects in retinal vascular ECs rather than other vascular cells.". Consider rephrasing to describe "other vasculature-associated cells", as no vascular cells outside the retina were examined in the manuscript. 

      We thank the reviewer for recommendation. We rephrased the "...defects in retinal vascular ECs rather than other vascular cells." into "...defects in retinal vascular ECs rather than other vasculature-associated cells" at page 8.

      (3) The manuscript is well written but contains numerous typos. E.g. "" (Pg 14), "MCY signaling axis" (figure 6 legend), "shCAPAL" (figure 5 K). Please correct these, and search carefully for others. 

      We are sorry for the careless mistakes we made, and we have checked the manuscript and correct these mistakes.

      Reviewer #3 (Recommendations For The Authors): 

      The following are somewhat grammatical, but significant issues, that I feel should be addressed before making the pre-print final: 

      (1) Perhaps the largest issue with the manuscript to me is whether CAPSL is an interesting candidate (as stated repeatedly) or causative of FEVR. Within the scope of what is feasible, this is a challenging problem. Since the publication of the pre-print, it would be great if another group independently reported the detection of mutations specifically in FEVR patients. That lacking, meaningful additions to the manuscript that I'd recommend are the inclusion of a paragraph on caveats of the study and reporting the allele frequencies based on public databases. As the authors know the data better than anyone and will have invested thought into the implications, they are the ones best positioned to alert the field to the study's limitations - amongst them- the factors that might practically distinguish whether CAPSL is a candidate or cause.

      We thank the reviewer for recommendation. We will collect more samples from FEVR families and screen for other mutation sites within the CAPSL gene in further studies.

      (2) It is unclear why the modeling with mice did not attempt to recapitulate the observations in humans, i.e., why were heterozygotes for a ubiquitous knockout not studied? Any data with heterozygotes, or ubiquitous alleles (which would be easier to generate than the strain studied) should be shared in the manuscript. If no such data exists, this reviewer would find it a worthwhile new experiment to add, but it is appreciated that new experiments are sometimes beyond the scope of what is possible. At the least, this would be worthwhile to discuss in the requested caveats paragraph of the discussion. 

      We thank the reviewer for recommendation. We evaluated the phenotype of Capsl heterozygous mice at P5, and the results showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. For example, heterozygous Lrp5 mice exhibited no visible angiogenic phenotype (PMID: 18263894). Corresponding description was added in the manuscript at page 6.

      (3) The statement in the Abstract "which provides invaluable information for genetic counseling and prenatal diagnosis of FEVR" should be toned down, better supported, or rephrased. This appears to be the 18th disease-associated gene for FEVR, with variants identified in 4 patients of the same ethnicity. In my opinion, the word "invaluable" is currently overstated. 

      We thank the reviewer for recommendation. We have changed "which provides invaluable information for genetic counseling and prenatal diagnosis of FEVR" into "which provides valuable information for genetic counseling and prenatal diagnosis of FEVR" in the abstract.

      (4) The transcriptomic and proteomic data should be deposited into a public repository and accession numbers added to the manuscript. 

      We thank the reviewer for recommendation. We have uploaded the raw data of transcriptomic and proteomic to the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa/browse/HRA010305) and the Genome Sequence Archive (https://www.ebi.ac.uk/pride/archive), respectively.

      (5) The links to MYC are over-stated in the title "through the MYC axis", the abstract "CAPSL function causes FEVR through MYC axis", and the discussion "we demonstrated that the defects in CAPSL affect EC function by down-regulating the MYC signaling cascade". The links to MYC are entirely by association, there were no experiments testing that the transcriptomic and proteomic changes observed were determinative of the CAPSL-mediated phenotype. It seems appropriate to conjecture that these changes are important, but the above statements all need to be altered and conjectures need to be clearly identified as such. 

      We are sorry to overstate the link between CAPSL-mediated phenotype and MYC axis in the abstract and discussion sections, and we have altered the statements in these sections to make it more logical. For example, we changed “This study also reveals that compromised CAPSL function causes FEVR through MYC axis, shedding light on the potential involvement of MYC signaling in the pathogenesis of FEVR.” into “This study also reveals that compromised CAPSL function causes FEVR may through MYC axis, shedding light on the potential involvement of MYC signaling in the pathogenesis of FEVR.” in the abstract. And in the discussion we changed “…cause FEVR through inactivating MYC signaling, expanding FEVR-involved signaling pathway and providing a potential therapeutic target for the intervention of FEVR” to “…cause FEVR may through inactivating MYC signaling, expanding FEVR-involved signaling pathway and providing a potential therapeutic target for the intervention of FEVR”.

      (6) Finally, I suggest that the following grammatical issues in the pre-print be corrected before making the pre-print final: 

      We have checked the manuscript and correct these mistakes.

      (a) p2. Suggest rewriting the sentence "Nevertheless, the molecular mechanisms by which CAPSL regulates cell processes and signaling cascades have yet to be elucidated." The preceding sentences only state that CASPL is a candidate in another disease - the word "nevertheless" seems to reflect a logic that isn't described. 

      We have checked the manuscript and correct these mistakes.

      (b) p5. Please correct the grammar "We, generated an inducible" 

      We corrected this mistake.

      (c) p5. Suggest rephrasing "impairing CAPSL expression." The word "expression" is often used in reference to transcription. To avoid confusion, something such as "eliminating or reducing protein abundance" might be better. 

      We corrected this mistake.

      (d) p6. Please correct the grammar "As expected, the radial vascular growth, as well as vessel density and vascular branching, are dramatically reduced in..." - note subject-verb agreement issue 

      We corrected this mistake.

      (e) Figure 3 legend - correct "(A) Hyloaid vessels"

      We corrected this mistake.

  3. Jul 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Fita-Torró et al. study the toxic effects of the intermediary lipid degradation product trans-2-hexadecenal (t-2-hex) on yeast mitochondria and suggest a mechanism by which Hfd1 safeguards Tom40 from lipidation by t-2-hex and its consequences, such as mitochondrial protein import inhibition, cellular proteostasis deregulation, and stress-responses. 

      The authors aimed to dissect a mechanism for t-2-hex' apoptotic consequences in yeast and they suggest it is via lipidation of Tom40 but really under the tested conditions everything seems lipidated. Thus, it is unclear whether Tom40 is the crucial causal target. They also do not provide much biochemical experiments to investigate this phenomenon further functionally. Tom40 is one possible and perhaps, given the cellular consequences, a reasonable candidate but not validated beyond in vitro lipidation by exogenous t-2-hex. 

      In the revised version of our manuscript, we have now included extensive new experimentation, which shows that protein import at the TOM complex is a physiologically important target of the pro-apoptotic lipid t-2-hex and that enzymes such as the Hfd1 dehydrogenase sensitively regulate this inhibition. In vitro chemoproteomic experiments have now been performed at more physiological t-2hex concentrations of 10µM, which is lower than published data in human cell models. Consistently, several TOM and TIM subunits are enriched in these in vitro lipidation studies (new Fig. 8B). Tom40 lipidation alone is not sufficient to explain t2-hex toxicity, as a cysteine-free version of Tom40 does not confer tolerance to the apoptotic lipid (new Fig. 8D). Importantly however, the loss of function of nonessential accessory Tom subunits 70 or 20 confers t-2-hex tolerance (new Fig. 8D) indicating that pre-protein import at the TOM complex is a physiological target of t2-hex most likely dependent on lipidation of more Tom subunits than just the essential Tom40 pore. Moreover, we now show that mitochondrial protein import is inhibited by the lipid at low physiological doses of 10µM and that this inhibition is modulated by the gene dose of the t-2-hex degrading Hfd1 enzyme (new Fig. 5G).

      Strengths: 

      The effects of lipids and their metabolic intermediates on protein function are understudied thus the authors' research contributing to elucidating direct effects of a single lipid is appreciated. It is particularly unknown by which mechanism t-2hex causes cell death in yeast. The authors elegantly use modulation of the levels of enzyme Hfd1 that endogenously catabolizes t-2-hex as an approach to studying t2-hex stress. Understanding the cause and consequences of this stress is relevant for understanding fundamental regulation mechanisms, and also to human health since the human homolog of Hfd1, ALDH3A2, is mutated in Sjögren-Larsson Syndrome. The application of a variety of global transcriptomic, functional genomic, and chemoproteomic approaches to study t-2-hex stress targets in the yeast model is laudable. 

      Weaknesses: 

      -  The extent of the contribution of Tom40 lipidation to the general t-2-hex stress phenotype is unclear. Is Tom40 lipidation alone enough to cause the phenotype? An alteration of the cysteine residue in question could help answer this key question. 

      Deletion of all four cysteine residues in Tom40 is not sufficient to confer resistance to t-2-hex stress. This result had been included in the original manuscript, but was somehow hidden in the Discussion. The revised manuscript now includes t-2hex tolerance assays for the Tom40 cysteine free mutant in new Figure 8. As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. We therefore included the non-essential adaptor proteins Tom70 and Tom20 of the TOM complex and tested the tolerance of the respective deletion mutants in t-2-hex tolerance assays. As shown in new Figure 8, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2hex and the tom20 mutant accumulates less Aim17 pre-protein upon t-2-he stress, indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      -  It is unclear whether the exogenously applied amounts of t-2-hex (concentrations chosen between 25-200 uM) are physiologically relevant in yeast cells. For comparison, Chipuk et al. (2012) used at most 1 uM on mitochondria of human cells, while Jarugumilli et al. (2018) considered 25 uM a 'lower dose' on human cells. Since the authors saw responses below 10 uM (Fig. 3B) and at the lowest selected concentration of 25 uM (Fig. 8), why were no lower, likely more specific, concentrations applied for the global transcriptomic and chemoproteomic experiments? Key experiments have to be repeated with the lower concentrations. 

      We have now performed several experiments with lower t-2-hex concentrations. A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information, combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Many subunits of the TOM and TIM complexes consistently are enriched significantly in both chemoproteomic experiments. These new data are summarized in revised Figure 8. Additionally we have performed in vivo pre-protein assays with lower t-2-hex concentrations. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor. It is important to note that a dose of 10µM of external t-2-hex addition is significantly lower than doses applied to human cell cultures such as in Jarugumilli et al. (2018). It proves that mitochondrial protein import is a sensitive and physiologically relevant t2-hex target in our yeast models and that t-2-hex detoxification by enzymes such as the Hfd1 dehydrogenase sensitively regulates this specific inhibition.

      -  The amount of t-2-hex applied is especially important to consider in light of over 1300 proteins lipidated to an extent equal to or greater than Tom40 (Supp. Table 6). This chemoproteomic experiment (Fig. 8B, Supp. Table 6) is also weakened by the inclusion of only 2 replicates, thus precluding assessment of statistical significance. The selection of targets in Fig. 8B as "among the best hits" is neither immediately comprehensible nor further explained and represents at best cherrypicking. Further evidence based on statistical significance or validation by other means should be provided.

      We performed the chemoproteomic screens as described by Jarugumilli et al. (2018) with 2 replicates of mock treated versus 2 replicates of t-2-hex-alkyne treated cell extracts.  A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Differential enrichment analysis of the proteomic data was performed with the amica software (Didusch et al., 2022). Proteins were ranked according to their log2 fold induction comparing lipid- and mock-treated samples with a threshold of ≥1.5, and the adjusted p-value was calculated. Several TOM and TIM subunits were consistently identified as differentially enriched proteins, which is summarized in new Figure 8B.

      - The authors unfortunately also underuse the possible contribution of mass spectrometry technology to in addition determine the extent and localization of lipidation on a global scale (especially relevant since Cohen et al. (2020) suggest site-specific mechanisms). 

      We agree that site-specific modifications of t-2-hex will be most likely important in the inhibition or other type of regulation of specific target proteins. Our collective data show that in the case of the inhibition of mitochondrial protein import, several lipidation events on TOM and TIM are involved. Dissection of individual cysteine lipidations on those subunits will be interesting, but we feel that this is out of the scope of the present work.

      - The general novelty of studying t-2-hex stress is lowered in light of existing literature in humans (see e. g. Chipuk et al., 2012; Cohen et al., 2020; Jarugumilli et al., 2018), and in yeast by the same authors (Manzanares-Estreder et al., 2017) and as the authors comment themselves, a significant part of the manuscript may represent rather a confirmation of the already described consequences of t-2-hex stress 

      We do not agree and we have not commented that our present study is a mere confirmation of t-2-hex stress previously applied in yeast and human models. In humans, t-2-hex has been identified as an efficient pro-apoptotic lipid, which causes mitochondrial dysfunction via direct lipidation of Bax, however the studies of Jarugumilli et al. (2018) revealed that many other direct t-2-hex targets exist, which remained uninvestigated to date. This work continues our previous studies (Manzanares-Estreder et al., 2017), where we show that t-2-hex is a universal proapoptotic lipid applicable in yeast models and contributes important novel findings, such as the massive transcriptional response resembling proteostatic defects caused by t-2-hex, mitochondrial protein import as a physiologically important and direct target of t-2-hex, the function of detoxifying enzymes such as Hfd1 in modulating lipid-mediated inhibition of mitochondrial protein import and general proteostasis. Additionally, we provide transcriptomic, chemoproteomic and functional genomic data to the scientific community, which will be a rich source for future studies on yet undiscovered pro-apoptotic mechanisms employed by t-2-hex. 

      Reviewer #2 (Public Review): 

      This study elucidates the toxic effects of the lipid aldehyde trans-2-hexadecenal (t-2-hex). The authors show convincingly that t-2-hex induces a strong transcriptional response, leads to proteotoxic stress, and causes the accumulation of mitochondrial precursor proteins in the cytosol. 

      The data shown are of high quality and well controlled. The genetic screen for mutants that are hyper-and hypo-sensitive to t-2-hex is elegant and interesting, even if the mechanistic insights from the screen are rather limited. The last part of the study is less convincing. The authors show evidence that t-2-hex affects subunits of the TOM complex. However, they do not formally demonstrate that the lipidation of a TOM subunit is responsible for the toxic effect of t-2-hex. A t-2-hexresistant TOM mutant was not identified. Moreover, it is not clear whether the concentrations of t-2-hex in this study are physiological. This is, however, a critical aspect. The literature is full of studies claiming the toxic effects of compounds such as H2O2; even if such studies are technically sound, they are misleading if nonphysiological concentrations of such compounds were used. 

      Nevertheless, this is an interesting study of high quality. A few specific aspects should be addressed.

      We have now performed t-2-hex toxicity assays using several mutants in Tom subunits, the cysteine free mutant of the essential Tom40 core channel and deletion mutants in the accessory subunits Tom70 and Tom20 (new Figure 8). As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. Indeed, as shown in new Figure 8, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2-hex indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      We have now performed several experiments with lower t-2-hex concentrations. A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Many subunits of the TOM and TIM complexes consistently are enriched significantly in both chemoproteomic experiments. These new data are summarized in revised Figure 8.

      Additionally we have performed in vivo pre-protein assays with lower t-2-hex concentrations. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor. It is important to note that a dose of 10µM of external t-2-hex addition is significantly lower than doses applied to human cell cultures such as in Jarugumilli et al. (2018). It proves that mitochondrial protein import is a sensitive and physiologically relevant t2-hex target in our yeast models and that t-2-hex detoxification by enzymes such as the Hfd1 dehydrogenase sensitively regulates this specific inhibition.

      Reviewer #3 (Public Review): 

      Summary: The authors investigate the effect of the lipid aldehyde trans-2hexadecenal (t-2-hex) in yeast using multiple omic analyses that show that a large range of cellular functions across all compartments are affected, e.g. transcriptomic changes affect 1/3 of all genes. The authors provide additional analyses, from which they built a model that mitochondrial protein import caused by modification of Tom40 is blocked. 

      Strengths: Global analyses (transcriptomic and functional genomics approach) to obtain an unbiased overview of changes upon t-2-hex treatment. 

      Weaknesses: It is not clear why the authors decided to focus on mitochondria, as only 30 genes assigned to the GO term "mitochondria" are increasing, and also the follow-up analyses using SATAY is not showing a predominance for mitochondrial proteins (only 4 genes are identified as hits). The provided additional experimental data do not support the main claims as neither protein import is investigated nor is there experimental evidence that lipidation of Tom40 occurs in vivo and impacts on protein translocation. 

      30 mitochondrial gene functions are very strongly (>10 fold) up-regulated by t-2-hex. However, when genes up-regulated (>2 log2FC) or down-regulated (<-2 log2FC) by t-2-hex were selected and subjected to GO category enrichment analysis, we found that “Mitochondrial organization” was the most numerous GO group activated by t-2-hex, while it was “Ribosomal subunit biogenesis” for t-2-hex repression (new data in Suppl. Tables 1 and 2). 

      In the revised version of our manuscript, we have now included extensive new experimentation, which shows that protein import at the TOM complex is a physiologically important target of the pro-apoptotic lipid t-2-hex and that enzymes such as the Hfd1 dehydrogenase sensitively regulate this inhibition. In vitro chemoproteomic experiments have now been performed at more physiological t-2hex concentrations of 10µM, which is lower than published data in human cell models. Consistently, several TOM and TIM subunits are enriched in these in vitro lipidation studies (new Fig. 8B). Tom40 lipidation alone is not sufficient to explain t2-hex toxicity, as a cysteine-free version of Tom40 does not confer tolerance to the apoptotic lipid (new Fig. 8D). Importantly however, the loss of function of nonessential accessory Tom subunits 70 or 20 confers t-2-hex tolerance (new Fig. 8D) indicating that pre-protein import at the TOM complex is a physiological target of t2-hex most likely dependent on lipidation of more Tom subunits than just the essential Tom40 pore. Moreover, we now show that mitochondrial protein import is inhibited by the lipid at low physiological doses of 10µM and that this inhibition is modulated by the gene dose of the t-2-hex degrading Hfd1 enzyme (new Fig. 5G).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Private recommendations for the authors 

      - On the existing data from Supp. Table 6, the authors may include a global assessment of whether or not the protein included a cysteine (the likely site for lipidation). 

      Although free cysteines in target proteins are the most frequent sites of modification by LDEs such as t-2-hex, other amino acids such as lysines or histidines can be lipidated by these lipid derivatives. Therefore we would like to exclude this information from our chemoproteomic data.

      - What determines whether a gene is labeled in Fig. 6B other than fold change? Why is MAC1 with the highest FC not shown? 

      We analyzed the potential anti-apoptotic SATAY hits with a log2 < -0.75 according to expected detoxification pathways (heat shock response, pleiotropic drug response), to their function in the ER (the intracellular site where t-2-hex is generated) or in mitochondria (the major t-2-hex target identified so far). This is now better described in the text. As for the potential pro-apoptotic SATAY hits, we analyzed gene functions with a log2 > 1.5 and marked the predominant groups “Cytosolic ribosome and translation” and “Amino acid metabolism”. In any case, the interested reader has all SATAY data available in supplemental tables 4 and 5 to find alternative gene functions with a potential role in cellular adaptation to t-2-hex.

      - Supplementary Table numbering should be double-checked.

      Ok, numbering has been double-checked.

      Reviewer #2 (Recommendations For The Authors): 

      Major points 

      (1) Identification of the t-2-hex target. Neither Tom70, Tom20 nor the cysteine in Tom40 is essential. If one of these components is critical for the t-2-hex-mediated toxicity, mutants should be t-2-hex-resistant. This is a straight-forward, simple, and critical experiment. 

      We have now performed t-2-hex toxicity assays in the cysteine free Tom40 mutant, and tom20 and tom70 deletion mutants. As shown in new Figure 8, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. However, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2-hex indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      (2) The authors claim that t-2-hex blocks the TOM complex. Since in vitro import assays with yeast mitochondria are a well established and simple technique, the authors should isolate mitochondria from their cells and perform import experiments. It is expected that those mitochondria show reduced import rates, however, swelling of these mitochondria to mitoplasts should suppress the import defect. 

      We agree that our study does not investigate a direct effect of t-2-hex on the import capacity of purified mitochondria. However, we determine the in vivo accumulation of several mitochondrial precursor proteins, which is widely used to assay for the efficiency of mitochondrial protein import, for example the recent hallmark paper discovering the mitoCPR protein import surveillance pathway exclusively uses epitope-tagged mitochondrial precursors to determine the regulation of mitochondrial protein import (Weidberg and Amon, Science 2018 360(6385)). Additionally, our new results that mutants in accessory TOM subunits 20 and 70 are hyperresistant to t-2-hex (Figure 8D) and that deletion of TOM20 decreases the t-2-hex induced pre-protein accumulation (Suppl. Figure 1) identify the TOM complex and hence protein import at the outer mitochondrial membrane as a physiologically important t-2-hex target.

      (3) The first part of the study is very strong. The last figure is also of good quality, however, it is not clear whether the effects on TOM subunits are really causal for the observed t-2-hex effect on gene expression. The authors might cure this by improved data or by avoiding bold statements such as: 'Hfd1 associates with the Tom70 subunit of the TOM complex and t-2-hex covalently lipidates the central Tom40 channel, which altogether indicates that transport of mitochondrial precursor proteins through the outer mitochondrial membrane is directly inhibited by the pro-apoptotic lipid and thus represents a hotspot for pro- and anti-apoptotic signaling.' (Abstract). 

      We now show that several TOM and TIM subunits are lipidated in vitro by physiological low t-2-hex concentrations, that loss of function of accessory subunits Tom20 or Tom70 rescues t-2-hex toxicity (new Figure 8) and that the gene dose of Hfd1 determines the degree of mitoprotein import block (new Figure 5). These data identify the TOM complex as a physiologically important target of the pro-apoptotic lipid. The Abstract has been modified accordingly.

      (4) If the t-2-hex levels are in a physiological range, one would expect that overexpression of Hfd1 prevents the t-2-hex-induced import arrest.

      We have now confirmed that overexpression of Hfd1 indeed prevents inhibition of mitochondrial protein import by t-2-hex. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor.

      (5) The authors claim that Fmp52 is a t-2-hex-detoxifying enzyme, but do not show evidence. They should rewrite this sentence and be more cautious, or they should show that increased Fmp52 levels indeed deplete t-2-hex from mitochondria.  

      We show that loss of Fmp52 function leads to a strong t-2-hex sensitivity. Fmp52 belongs to the NAD-binding short-chain dehydrogenase/reductase (SDR) family and localizes to highly purified mitochondrial outer membranes (Zahedi et al, 2006). These are the indications that suggest that Fmp52 participates in the enzymatic detoxification of t-2-hex in addition to Hfd1. The Results section has been modified accordingly.

      Minor points: 

      (6) Aim17 was recently identified as a characteristic constituent of cytosolic protein aggregates named MitoStores (Krämer et al., 2023, EMBO J). The authors might test whether the cytosolic Aim17 protein colocalizes with the Hsp104-GFP granules that accumulate upon t-2-hex exposure as shown in Fig. 4A. 

      We agree that determining the fate of unimported mitochondrial precursors upon t-2-hex stress would be interesting. We have made some attempts to co-visualize Aim17-dsRed and Hsp104-GFP upon t-2-hex treatment, but we still have some technical issues. While we clearly see that Aim17 accumulates in cytoplasmic foci upon prolonged t-2-hex exposure, we are not able to determine colocalization with Hsp104, in great part because t-2-hex causes mitochondrial fragmentation, which leads to the appearance of Aim17-stained foci in the cytosol independently of protein aggregates. While so far we are not able to localize Aim17 unambiguously in Hsp104 containing aggregates (mitoStores) upon lipid stress, we would like to move the manuscript farther without those experiments.

      (7) In Fig. 1A, the figures of the different lines are difficult to distinguish. Lines of one color with different intensities would be better suited. 

      We have been working before with dose-response profiles generated by the destabilized luciferase system and found that the color-coded representation of the plots is the most effective way to represent the data, see for example Fita-Torró et al. Mol Ecol. 2023 32(13):3557-3574, Pascual-Ahuir et al. BBA 2019 1862(4):457-471, Rienzo et al., Mol Cell Biol. 2015 35(21):3669-83, and several other publications. Therefore we want to keep the format of the Figure.

      (8) A title page should be added to each of the supplemental data files with short descriptions of the information that is provided in the columns of the tables.  Response: Explanatory title pages have been now added to the supplemental data files.

      Reviewer #3 (Recommendations For The Authors): 

      Figure 5A: The authors aim to assess protein import, however, their experimental set-up is not suited and does not allow conclusions about protein translocation into mitochondria. The authors monitor protein steady state levels, which does not reflect import capacity. For this e.g. pulse-chase experiments coupled to coIP or in organello import assays with radiolabeled substrate proteins would be required. In addition, the authors lack a non-treated control to show that no precursor accumulates in the absence of CCCP and t-2-hex. At the moment, the conclusion of blocked import cannot be made, as there are many other explanations for the observed steady state levels, e.g. the TAP tag interfered with the import competence of the precursor or t-2-hex could impact on MPP function (in particular as Figure 8B shows that also intra-mitochondrial proteins undergo modification by t-2-hex). 

      We agree that our study does not investigate a direct effect of t-2-hex on the import capacity of purified mitochondria. However, we determine the in vivo accumulation of several mitochondrial precursor proteins, which is widely used to assay for the efficiency of mitochondrial protein import, for example the recent hallmark paper discovering the mitoCPR protein import surveillance pathway exclusively uses epitope-tagged mitochondrial precursors to determine the regulation of mitochondrial protein import (Weidberg and Amon, Science 2018 360(6385)). Figure 5 contains several non-treated control experiments, which show that no (or less in the case of Ilv6) precursors of Tap-tagged Aim17, Cox5a, Ilv6, or Sdh4 accumulate in the absence of CCCP or t-2-hex. This is shown in Figure 5A for untreated cells or in Figure 5B and new Figure 5G for solvent (DMSO) treated cells. This demonstrates that the Tap-tag does not interfere with the import of the respective precursors. Additionally, our new results that mutants in accessory TOM subunits 20 and 70 are hyperresistant to t-2-hex (Figure 8D) identify the TOM complex and hence protein import at the outer mitochondrial membrane as a physiologically important t-2-hex target.

      Figure 8: The conclusion that Tom40 is directly lipidated comes from an in vitro assay, with the conclusion that Tom40 is the main target, because it is the only Tom protein with a cysteine (Tom70 as not being part of the Tom core is excluded, however, lack of Tom70 function would also have detrimental consequences for mitochondrial protein import). However, there is no experiment showing a modification of Tom40 and a consequence for protein import. The proposed model is therefore very far-fetched and several aspects are speculation but not supported by experimental data. To propose such a model, the author needs to show experimental evidence, e.g. by generating a yeast strain in which the cysteine i Tom40 are replaced by e.g. Serine residues, and then assess if protein import (e.g. pulse-chase assays) are not affected anymore upon addition of t-2-hex. 

      Deletion of all four cysteine residues in Tom40 is not sufficient to confer resistance to t-2-hex stress. This result had been included in the original manuscript, but was somehow hidden in the Discussion. The revised manuscript now includes t-2hex tolerance assays for the Tom40 cysteine free mutant in new Figure 8D. As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. We therefore included the non-essential adaptor proteins Tom70 and Tom20 of the TOM complex and tested the tolerance of the respective deletion mutants in t-2-hex tolerance assays. As shown in new Figure 8D, the absence of Tom70 and Tom20 function significantly increases tolerance to t2-hex indicating that the TOM complex is a physiologically important target of the pro-apoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      Figure 8A: The pulldown experiments lack positive (other Tom subunits) and negative controls and were performed with (large) tags on all proteins, which can easily result in false positive interactions. The conclusion that Hfd1 interacts with Tom70 and Tom22 cannot be made. Also, the conclusion if an interaction is robust or not cannot be made as the pull-down lacks control fractions, it is also not clear how much of the eluate was loaded. Finally, Hfd1-HA was not expressed from its endogenous promoter, likely resulting in over-expression, which again strongly hampers conclusions about bona fide interaction partners. 

      We agree that our pulldown studies are done in an artificial context, such as Hfd1 overexpression needed for sufficient protein level for detection or use of Tapfusion proteins. However, the conclusion that Tom70 is a potential interactor of Hfd1 can be made based on the following observations: Hfd1-HA is preferentially pulled down from total protein extracts containing Tom70-Tap, but not from extracts containing no Tap-protein and significantly less from extracts containing Tom22-Tap, another TOM associated subunit. The pulldown assay has been repeated now several times and the efficiency of Hfd1 pulldown has been quantified and statistically analyzed with respect to the quantity of purified Tom protein, which is shown in modified Figure 8A. 

      Figure 4A and C: Depletion of proteasomal activity results in larger aggregates in Figure 4A. However, the addition of t-2-hex blocks proteasomal activity (Figure 4C). How can proteasome inhibition result in bigger aggregates if the proteasomal activity is lost upon t-2-hex addition?

      The negative effect of t-2-hex on proteasomal activity is most likely an indirect effect caused by protein aggregation (Bence et al., Science 2001 292-1552) and occurs in wild type and rpn4 mutant cells with reduced proteasomal activity (Fig. 4C). t-2-hex causes cytosolic protein aggregation in wild type cells, which is aggravated (more and larger protein aggregates) in rpn4 mutants because of their lower levels of active proteasome (Fig. 4A). The observed protein aggregates will further diminish proteasomal activity, which is confirmed in Fig. 4C. 

      Figure 1B: The authors use a reporter to determine HFD1 expression that consists of the promoter region of HFD1 fused to luciferase. These fusion constructs have been shown to often not reflect the bona fide expression levels of genes (Yoneda et al., J Cell Sci 2004). qPCR analysis of transcript levels should be included to support the induction of HFD1. 

      We agree that the live cell luciferase reporters used here are not suitable for the determination of absolute mRNA levels. However, the aim of these reporter experiments is to quantify the inducibility of different genes (HFD1, GRE2) dependent on increasing stress doses. These dose response profiles cannot be obtained by qPCR analysis, while the destabilized reporters are an excellent tool for this, which have been used to accurately describe numerous dynamic stress responses (for example: Dolz-Edo et al. 2013 MCB 33:2228-40, Rienzo et al. 2015 MCB 35:3669-83, PascualAhuir et al. 2019 BBA 862:457-471). Additionally, the induction of HFD1 mRNA levels by salt (NaCl) and oxidative (menadione) stress determined by qPCR has been published before (Manzanares-Estreder et al. 2017 Oxid Med Cell Longevity 2017:2708345).

      The authors conclude from Figure 1 that entry into apoptotic cell death is modulated by efficient t-2-hex detoxification. However, this is based on growth curves and no analysis of apoptotic cell death is performed. The data show that the addition of hexadecenal results in a growth arrest, that is overcome likely upon degradation of t-2-hex (depending on the amount of Hfd1). 

      We agree that our experiments measure growth inhibition and not specifically apoptotic cell death. The text has been changed accordingly.  

      Figure 4A: Microscopy images show between 1-2 yeast cells. Either more cells need to be shown or quantifications of the aggregates are required. In addition, it is not clear if the control received the same DMSO concentration as the treated cells and also the time point for the control is not specified. 

      We have now quantified the number of aggregates across cell populations in new Figure 4A in DMSO, t-2-hex and t-2-hex-H2 treated wt and rpn4 mutants. These data show specific aggregate induction by t-2-hex and not by DMSO or the saturated t-2-hex-H2 control, which is aggravated in rpn4 mutants and avoided by CHX pretreatment.

      Figure 5: Western blots in figure 5A, B, D, E and F lack a loading control. Without this, conclusions about increases in protein abundance cannot be made.  Response: We have now included additional panels with the loading controls for the Western blots in new figure 5, except figure 5B, where the appearance or not of the pre-protein can be compared to the amount of mature protein in the same blot.

      Figure 2B: Complex II assembly factors SDH5,6,9 are described here as ETC complexes. As the proteins are not part of the mature complex II, the heading should be modified into ETC complexes and ETC assembly.

      Figure 2B has been revised and the classification of ETC proteins changed accordingly.

    1. Author response

      Reviewer #1 (Public Review):

      The authors use neural recordings from three different brain areas to assess whether the type of evidence accumulation dynamics in those regions are (1) similar to one another, and (2) similar to best-fitting evidence accumulation dynamics to behavioral choice alone. This is an important theoretical question because it relates to the 'linking hypothesis' that relates neurophysiological data to psychological phenomena. Although the standard evidence accumulation dynamic in describing choice has been the gradual accumulation of evidence, the authors find that those dynamics are not represented equally in all brain regions. Such results suggest that more nuanced computational models are needed to explain how brain areas interact to produce decisions, and the focus of theoretical development should shift away from explaining behavioral patterns alone and more toward explaining both brain and behavioral interactions. Given that the authors simply test the assumption that the same dynamics that best explain behavior should also explain neural data, they accomplish their objective using a sophisticated methodology and find evidence *against* this assumption: they find that each region was best described by a distinct accumulation model, which all differed from the model that best described the rat's choices.

      I thought this was an excellent paper with a clear scientific objective, direct analysis to achieve that objective, and a very strong methodological approach to leave little doubt that the conclusions they drew from their analyses were as reasonable and accurate as possible.

      We thank the reviewer for their time and appreciate their generous comments.

      Reviewer #2 (Public Review):

      The neural dynamics underlying decision-making have long been studied across different species (e.g., primates and rodents) and brain areas (e.g., parietal cortex, frontal eye fields, striatum). The key question is to what extent neural firing rates covary with evidence accumulation processes as proposed by evidence accumulation models. It is often assumed that the evidence-accumulation process at the neural level should mirror the evidence-accumulation process at the behavioral level. The current paper shows that the neural dynamics of three rat brain regions (the FOF, ADS, and PCC) all show signatures of evidence accumulation, but in distinct ways. Especially the role of the FOF appears to be distinct, due to its dependence on early evidence compared to the other regions. This sheds new light and a new interpretation of the role of the FOF in decision-making - previously, it has been described as a region encoding the choice that is currently being committed to; this new analysis suggests it is instead strongly influenced by early evidence.

      A major strength of the paper is that the results are achieved through joint modelling of the behavioral and neural data, combined with information on the physical stimulus at hand. Joint models were shown to provide more information on the underlying processes compared to behavioral or neural models alone. Especially the inclusion of the neural data seemed to have greatly improved the quality of inferences. This is a key contribution that illustrates that the sophisticated modelling of multiple sources of data at the same time, pays off in terms of the quality of inferences. Yet, it should be added here, that due to the nature of the task, the behavioral data contained only choices, and not response times, which tend to contain more information regarding the evidence accumulation process than choice alone. It would be interesting to additionally discuss how choice decision times can be modeled with the proposed modelling framework.

      We thank the reviewer for their generous views on our work. We agree that adding decision times, which could readily be added to our framework, will likely further constrain the inference of the latent model. We are currently pursuing such topics using this framework and appropriate data. We have altered a passage in our Discussion, where we note the various extensions of our model one could pursue, to include response time within the set of behavioral measurements one might include.

      A main limitation of the paper is that it does not appear to address a seemingly logical follow-up question: If these three brain regions individually accumulate evidence in distinct manners, how do these multiple brain regions then each contribute to a final choice? The joint models fit each region's data separately, so how well does each region individually 'explain' or 'predict' behavior, and how does the combined neural activity of the regions lead to manifest behavior? I would be very interested in the authors' perspectives on these questions.

      We could not share the reviewers view and interest in this question with any more excitement than we already do! Unfortunately, the experiments necessary for answering this question in a satisfying way have not yet been performed (e.g. simultaneous multi-region population recordings). Additionally, our analysis approach, as presented currently, would require some technical alterations to deal with data at that scale. Both efforts are underway, but we feel as though the current manuscript describes the basic modeling framework one would need to use to address these questions if/when such data exists. We have added some text to the Discussion to highlight these exciting future directions:

      “An exciting future application of our modeling framework is to model multiple, independent accumulators in several brain regions which collectively give rise to the animal’s behavior. Such a model would provide incredible insight into how the brain collectively gives rise to behavioral choices.”

      There are some remaining questions regarding the specific models used, that I was hoping the authors could clarify. Specifically, in equations 10-11, I was wondering to what extent there might be a collinearity issue. Equation 10 proposes that the firing rates of neurons can vary across time due to two mechanisms: (1) The dependence of the firing rate on the accumulated evidence, and (2) a time-varying trial average (as detailed in Equation 11). If firing rates of the neuron indeed covary with the accumulated evidence and therefore increase across time, how can the effects of mechanisms 1 and 2 be disentangled? Relatedly, the independent noise models model each neuron separately and thereby include many more parameters, each informed by less data. Is it possible that the relatively poor cross-validation of the independent noise model may be a consequence of the overfitting of the independent noise model?

      Thank you for this important observation. Please see our response to the essential revisions above which addresses this issue. In short, although it is true that firing rates increase with time (with accumulating evidence) they do so in a way that depends on the stimulus, and so just as often as they increase with time, they decrease.

      Regarding the poor cross-validation of the independent noise model, we apologize for confusion here — both the shared and independent noise model have exactly the same number of parameters. They only differ in that the latent process for a trial contains unique noise instantiation per trial for the independent noise model and the same instantiating for the shared model. The number of parameters is the same. See above for our response to this issue, and how the manuscript was modified in light of this confusion.

      Another related question is how robust the parameter recovery properties of these models are under a wider range of data-generating parameter settings. I greatly appreciate the inclusion of a parameter recovery study (Figure S1C) using a single synthetic dataset, but it could be made even stronger by simulating multiple datasets with a wider range of parameter settings. Such a simulation study would help understand how robust and reliable the estimated parameters of all models are. Similarly, it would be helpful if also the \theta_{y} parameters are shown, which aren't shown in Figure S1C.

      We agree that understanding the model fitting behavior under a wider set of parameter settings is valuable. We fit our model to additional sets of parameter settings and included an additional supplemental figure (Figure 1 — figure supplement 2) to illustrate these results. In short, we found that parameter recovery was robust across the different parameter settings we tested. We also updated Figure S1C with the neural parameters. We included the following in the Results to note that parameter recovery was robust:

      “We verified that our method was able to recover the parameters that generated synthetic physiologically-relevant spiking and choices data (Figure 1 — figure supplement 1), and that parameter recovery was robust across a range of parameter values (Figure 1 — figure supplement 2)).” 

      An aspect of the paper that initially raised confusion with me is that the models fit on the choice data and stimulus information alone, make different predictions for the evidence accumulation dynamics in different regions (e.g., Figure 5A, 6A) and also led to different best-fitting parameters in different regions (Figure S9A). It took me a while to realize that this is due to the data being pooled across different rats and sessions - as such, the behavioral choice data are not the same across regions, and neither is the resulting fit models. This could easily be clarified by adding a few notes in the captions of the relevant figures.

      Thanks for pointing this out. We agree that this tends to be a point of confusion, and we have added clarification prior to Fig 3, where the choice model is first introduced:

      “We stress that because of this, each fitted choice model uses different behavioral choice data, and thus the fitted parameters vary from fitted model to fitted model.”

      Combined, this manuscript represents an interesting and welcome contribution to an ongoing debate on the neural dynamics of decision-making across different brain regions. It also introduced new joint modelling techniques that can be used in the field and raised new questions on how the concurrent activity of neurons across different brain regions combined leads to behavior.

      We appreciate the very generous views on our work!

    1. Author response:

      eLife assessment

      This useful study reports on the discovery of an antimicrobial agent that kills Neisseria gonorrhoeae. Sensitivity is attributed to a combination of DedA assisted uptake of oxydifficidin into the cytoplasm and the presence of a oxydifficidin-sensitive RplL ribosomal protein. Due to the narrow scope, the broader antibacterial spectrum remains unclear and therefore the evidence supporting the conclusions is incomplete with key methods and data lacking. This work will be of interest to microbiologists and synthetic biologists.

      General comment about narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The main focus of this study is on its previously unreported potent anti-gonococcal activity and mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      We are troubled by the statement that our paper is narrow in scope and that evidence supporting our conclusions is incomplete. We do not feel the reviews as presented substantiate drawing this conclusion about our work.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kan et al. report the serendipitous discovery of a Bacillus amyloliquefaciens strain that kills N. gonorrhoeae. They use TnSeq to identify that the anti-gonococcal agent is oxydifficidin and show that it acts at the ribosome and that one of the dedA gene products in N. gonorrhoeae MS11 is important for moving the oxydifficidin across the membrane.

      Strengths:

      This is an impressive amount of work, moving from a serendipitous observation through TnSeq to characterize the mechanism by which Oxydifficidin works.

      Weaknesses:

      (1) There are important gaps in the manuscript's methods.

      The requested additions to the method describing bacterial sequencing and anti-gonococcal activity screening will be made. However, we do not think the absence of these generic methods reduces the significance of our findings.

      (2) The work should evaluate antibiotics relevant to N. gonorrhoeae.

      (1) It is not clear to us why reevaluating the activity of well characterized antibiotics against known gonorrhoeae clinical strains would add value to this manuscript. The activity of clinically relevant antibiotics against antibiotic-resistant N. gonorrhoeae clinical isolates is well described in the literature. Our use of antibiotics in this study was intended to aid in the identification of oxydifficidin’s mode of action. This is true for both Tables 1 and 2.

      (2) If the reviewer insists, we would be happy to include MIC data for the following clinically relevant antibiotics: ceftriaxone (cephalosporin/beta-lactam), gentamicin (aminoglycoside), azithromycin (macrolide), and ciprofloxacin (fluoroquinolone).

      (3) The genetic diversity of dedA and rplL in N. gonorrhoeae is not clear, neither is it clear whether oxydifficidin is active against more relevant strains and species than tested so far.

      (1) We thank the reviewer for this suggestion. We aligned the DedA sequence from strain MS11 with DedA proteins from 220 N. gonorrhoeae strains that have high-quality assemblies in NCBI. The result showed that there are no amino acid changes in this protein. Using the same method, we observed several single amino acid changes in RplL. This included changes at A64, G25 and S82 in 4 strains with one change per strain. These sites differ from R76 and K84, where we identified changes that provide resistance to oxydifficidin. Notably, in a similar search of representative Escherichia, Chlamydia, Vibrio, and Pseudomonas NCBI deposited genomes, we did not identify changes in RplL at position R76 or K84.

      (2) While the usefulness of screening more clinically relevant antibiotics against clinical isolates as suggested in comment 2 was not clear to us, we agree that screening these strains for oxydifficidin activity would be beneficial. We have ordered Neisseria gonorrhoeae strain AR1280, AR1281 (CDC), and Neisseria meningitidis ATCC 13090. They will be tested when they arrive.

      Reviewer #2 (Public Review):

      Summary:

      Kan et al. present the discovery of oxydifficidin as a potential antimicrobial against N. gonorrhoeae, including multi-drug resistant strains. The authors show the role of DedA flippase-assisted uptake and the specificity of RplL in the mechanism of action for oxydifficidin. This novel mode of action could potentially offer a new therapeutic avenue, providing a critical addition to the limited arsenal of antibiotics effective against gonorrhea.

      Strengths:

      This study underscores the potential of revisiting natural products for antibiotic discovery of modern-day-concerning pathogens and highlights a new target mechanism that could inform future drug development. Indeed there is a recent growing body of research utilizing AI and predictive computational informatics to revisit potential antimicrobial agents and metabolites from cultured bacterial species. The discovery of oxydifficidin interaction with RplL and its DedA-assisted uptake mechanism opens new research directions in understanding and combating antibiotic-resistant N. gonorrhoeae. Methodologically, the study is rigorous employing various experimental techniques such as genome sequencing, bioassay-guided fractionation, LCMS, NMR, and Tn-mutagenesis.

      Weaknesses:

      The scope is somewhat narrow, focusing primarily on N. gonorrhoeae. This limits the generalizability of the findings and leaves questions about its broader antibacterial spectrum. Moreover, while the study demonstrates the in vitro effectiveness of oxydifficidin, there is a lack of in vivo validation (i.e., animal models) for assessing pre-clinical potential of oxydifficidin. Potential SNPs within dedA or RplL raise concerns about how quickly resistance could emerge in clinical settings.

      (1) Spectrum/narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The focus of this study is on its previously unreported potent anti-gonococcal activity and its mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      (2) Animal models: We acknowledge the reviewer’s insight regarding the importance of in vivo validation to enhance oxydifficidin’s pre-clinical potential. However, due to the labor-intensive process needed to isolate oxydifficidin, obtaining a sufficient quantity for animal studies is beyond the scope of this study. Our future work will focus on optimizing the yield of oxydifficidin and developing a topical mouse model for subsequent investigations.

      (3) Potential SNPs: Please see our response to Reviewer #1’s comment 3. We acknowledge that potential SNPs within dedA and rplL raise concerns regarding clinical resistance, which is a common issue for protein-targeting antibiotics. Yet, as pointed out in the manuscript, obtaining mutants in the lab was a very low yield endeavor.

      Reviewer #3 (Public Review):

      Summary:

      The authors have shown that oxydifficidin is a potent inhibitor of Neisseria gonorrhoeae. They were able to identify the target of action to rplL and showed that resistance could occur via mutation in the DedA flippase and RplL.

      Strengths:

      This was a very thorough and clearly argued set of experiments that supported their conclusions.

      Weaknesses:

      There was no obvious weakness in the experimental design. Although it is promising that the DedA mutations resulted in attenuation of fitness, it remains an open question whether secondary rounds of mutation could overcome this selective disadvantage which was untried in this study.

      We thank the reviewer for the positive comment. We agree that investigating factors that could compensate for the fitness attenuation caused by DedA mutation would enhance our understanding of the role of DedA.

    1. Author response:

      We thank you for the opportunity to provide a concise response. The criticisms are accurately summarized in the eLife assessment:

      the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      The essence of our study is to propose the adoption of the Haldane model of genetic drift, based on the branching process, in lieu of the Wright-Fisher (WF) model, based on sampling, usually binomial.  In addition to some extensions of the Haldane model, we present 4 paradoxes that cannot be resolved by the WF model. The reviews suggest that some of the paradoxes could be resolved by the WF model, if we engage prior literature sufficiently.

      We certainly could not review all the literature on genetic drift as there must be thousands of them. Nevertheless, the literature we do not cover is based on the WF model, which has the general properties that all modifications of the WF model share.  (We should note that all such modifications share the sampling aspect of the WF model. To model such sampling, N is imposed from outside of the model, rather than self-generating within the model.  Most important, these modifications are mathematically valid but biologically untenable, as will be elaborated below. Thus, in concept, the WF and Haldane models are fundamentally different.)

      In short, our proposal is general with the key point that the WF model cannot resolve these (and many other) paradoxes.  The reviewers disagree (apparently only partially) and we shall be specific in our response below.

      We shall first present the 4th paradox, which is about multi-copy gene systems (such as rRNA genes and viruses, see the companion paper). Viruses evolve both within and between hosts. In both stages, there are severe bottlenecks.  How does one address the genetic drift in viral evolution? How can we model the effective population sizes both within- and between- hosts?  The inability of the WF model in dealing with such multi-copy gene systems may explain the difficulties in accounting for the SARS-CoV-2 evolution. Given the small number of virions transmitted between hosts, drift is strong which we have shown by using the Haldane model (Ruan, Luo, et al. 2021; Ruan, Wen, et al. 2021; Hou, et al. 2023). 

      As the reviewers suggest, it is possible to modify the WF model to account for some of these paradoxes. However, the modifications are often mathematically convenient but biologically dubious. Much of the debate is about the progeny number, K.  (We shall use haploid model for this purpose but diploidy does not pose a problem as stated in the main text.) The modifications relax the constraint of V(k) = E(k) inherent in the WF sampling.  One would then ask how V(k) can be different from E(k) in the WF sampling even though it is mathematically feasible (but biologically dubious)?  Kimura and Crow (1963) may be the first to offer a biological explanation.  If one reads it carefully, Kimura's modification is to make the WF model like the Haldane model. Then, why don't we use the Haldane model in the first place by having two parameters, E(k) and V(k), instead of the one-parameter WF model?

      The Haldane model is conceptually simpler. It allows the variation in population size, N, to be generated from within the model, rather than artificially imposed from outside of the model.  This brings us to the first paradox, the density-dependent Haldane model. When N is increasing exponentially as in bacterial or yeast cultures, there is almost no drift when N is very low and drift becomes intense as N grows to near the carrying capacity.  We do not see how the WF model can resolve this paradox, which can otherwise be resolved by the Haldane model.

      The second and third paradoxes are about how much mathematical models of population genetic can be detached from biological mechanisms. The second paradox about sex chromosomes is rooted in the realization of V(k) ≠ E(k).  Since E(k) is the same between sexes but V(k) is different, how does the WF sampling give rise to V(k) ≠ E(k)? We are asking a biological question that troubled Kimura and Crow (1963) alluded above. The third paradox is acknowledged by two reviewers. Genetic drift manifested in the fixation probability of an advantageous mutation is 2s/V(k).  It is thus strange that the fundamental parameter of drift in the WF model, N (or Ne), is missing.  In the Haldane model, drift is determined by V(k) with N being a scaling factor; hence 2s/V(k) makes perfect biological sense,

      We now answer the obvious question: If the model is fundamentally about the Haldane model, why do we call it the WF-Haldane model? The reason is that most results obtained by the WF model are pretty good approximations and the branching process may not need to constantly re-derive the results.  At least, one can use the WF results to see how well they fit into the Haldane model. In our earlier study (Chen, et al. (2017); Fig. 3), we show that the approximations can be very good in many (or most) settings.

      We would like to use the modern analogy of gas-engine cars vs. electric-motor ones. The Haldane model and the WF model are as fundamentally different concepts as the driving mechanisms of gas-powered vs electric cars.  The old model is now facing many problems and the fixes are often not possible.  Some fixes are so complicated that one starts thinking about simpler solutions. The reservations are that we have invested so much in the old models which might be wasted by the switch. However, we are suggesting the integration of the WF and Haldane models. In this sense, the WF model has had many contributions which the new model gratefully inherits. This is true with the legacy of gas-engine cars inherited by EVs.

      The editors also issue the instruction: while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims. 

      We are thankful to the editors and reviewers for the thoughtful comments and constructive criticisms. We also appreciate the publishing philosophy of eLife that allows exchanges, debates and improvements, which are the true spirits of science publishing.

      References for the provisional author responses

      Chen Y, Tong D, Wu CI. 2017. A New Formulation of Random Genetic Drift and Its Application to the Evolution of Cell Populations. Mol. Biol. Evol. 34:2057-2064.

      Hou M, Shi J, Gong Z, Wen H, Lan Y, Deng X, Fan Q, Li J, Jiang M, Tang X, et al. 2023. Intra- vs. Interhost Evolution of SARS-CoV-2 Driven by Uncorrelated Selection-The Evolution Thwarted. Mol. Biol. Evol. 40.

      Kimura M, Crow JF. 1963. The measurement of effective population number. Evolution:279-288.

      Ruan Y, Luo Z, Tang X, Li G, Wen H, He X, Lu X, Lu J, Wu CI. 2021. On the founder effect in COVID-19 outbreaks: how many infected travelers may have started them all? Natl. Sci. Rev. 8:nwaa246.

      Ruan Y, Wen H, He X, Wu CI. 2021. A theoretical exploration of the origin and early evolution of a pandemic. Sci Bull (Beijing) 66:1022-1029.

      Review comments

      eLife assessment 

      This study presents a useful modification of a standard model of genetic drift by incorporating variance in offspring numbers, claiming to address several paradoxes in molecular evolution.

      It is unfortunate that the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      We do not believe that the paradoxes can be resolved.

      In addition, while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors present a theoretical treatment of what they term the "Wright-Fisher-Haldane" model, a claimed modification of the standard model of genetic drift that accounts for variability in offspring number, and argue that it resolves a number of paradoxes in molecular evolution. Ultimately, I found this manuscript quite strange.

      The notion of effective population size as inversely related to the variance in offspring number is well known in the literature, and not exclusive to Haldane's branching process treatment. However, I found the authors' point about variance in offspring changing over the course of, e.g. exponential growth fairly interesting, and I'm not sure I'd seen that pointed out before.

      Nonetheless, I don't think the authors' modeling, simulations, or empirical data analysis are sufficient to justify their claims. 

      Weaknesses: 

      I have several outstanding issues. First of all, the authors really do not engage with the literature regarding different notions of an effective population. Most strikingly, the authors don't talk about Cannings models at all, which are a broad class of models with non-Poisson offspring distributions that nonetheless converge to the standard Wright-Fisher diffusion under many circumstances, and to "jumpy" diffusions/coalescents otherwise (see e.g. Mohle 1998, Sagitov (2003), Der et al (2011), etc.). Moreover, there is extensive literature on effective population sizes in populations whose sizes vary with time, such as Sano et al (2004) and Sjodin et al (2005).

      Of course in many cases here the discussion is under neutrality, but it seems like the authors really need to engage with this literature more. 

      The most interesting part of the manuscript, I think, is the discussion of the Density Dependent Haldane model (DDH). However, I feel like I did not fully understand some of the derivation presented in this section, which might be my own fault. For instance, I can't tell if Equation 5 is a result or an assumption - when I attempted a naive derivation of Equation 5, I obtained E(K_t) = 1 + r/c*(c-n)*dt. It's unclear where the parameter z comes from, for example. Similarly, is equation 6 a derivation or an assumption? Finally, I'm not 100% sure how to interpret equation 7. I that a variance effective size at time t? Is it possible to obtain something like a coalescent Ne or an expected number of segregating sites or something from this? 

      Similarly, I don't understand their simulations. I expected that the authors would do individual-based simulations under a stochastic model of logistic growth, and show that you naturally get variance in offspring number that changes over time. But it seems that they simply used their equations 5 and 6 to fix those values. Moreover, I don't understand how they enforce population regulation in their simulations---is N_t random and determined by the (independent) draws from K_t for each individual? In that case, there's no "interaction" between individuals (except abstractly, since logistic growth arises from a model that assumes interactions between individuals). This seems problematic for their model, which is essentially motivated by the fact that early during logistic growth, there are basically no interactions, and later there are, which increases variance in reproduction. But their simulations assume no interactions throughout! 

      The authors also attempt to show that changing variance in reproductive success occurs naturally during exponential growth using a yeast experiment. However, the authors are not counting the offspring of individual yeast during growth (which I'm sure is quite hard). Instead, they use an equation that estimates the variance in offspring number based on the observed population size, as shown in the section "Estimation of V(K) and E(K) in yeast cells". This is fairly clever, however, I am not sure it is right, because the authors neglect covariance in offspring between individuals. My attempt at this derivation assumes that I_t | I_{t-1} = \sum_{I=1}^{I_{t-1}} K_{i,t-1} where K_{i,t-1} is the number of offspring of individual i at time t-1. Then, for example, E(V(I_t | I_{t-1})) = E(V(\sum_{i=1}^{I_{t-1}} K_{i,t-1})) = E(I_{t-1})V(K_{t-1}) + E(I_{k-1}(I_{k-1}-1))*Cov(K_{i,t-1},K_{j,t-1}). The authors have the first term, but not the second, and I'm not sure the second can be neglected (in fact, I believe it's the second term that's actually important, as early on during growth there is very little covariance because resources aren't constrained, but at carrying capacity, an individual having offspring means that another individuals has to have fewer offspring - this is the whole notion of exchangeability, also neglected in this manuscript). As such, I don't believe that their analysis of the empirical data supports their claim. 

      Thus, while I think there are some interesting ideas in this manuscript, I believe it has some fundamental issues:

      first, it fails to engage thoroughly with the literature on a very important topic that has been studied extensively. Second, I do not believe their simulations are appropriate to show what they want to show. And finally, I don't think their empirical analysis shows what they want to show. 

      References: 

      Möhle M. Robustness results for the coalescent. Journal of Applied Probability. 1998;35(2):438-447. doi:10.1239/jap/1032192859 

      Sagitov S. Convergence to the coalescent with simultaneous multiple mergers. Journal of Applied Probability. 2003;40(4):839-854. doi:10.1239/jap/1067436085 

      Der, Ricky, Charles L. Epstein, and Joshua B. Plotkin. "Generalized population models and the nature of genetic drift." Theoretical population biology 80.2 (2011): 80-99 

      Sano, Akinori, Akinobu Shimizu, and Masaru Iizuka. "Coalescent process with fluctuating population size and its effective size." Theoretical population biology 65.1 (2004): 39-48 

      Sjodin, P., et al. "On the meaning and existence of an effective population size." Genetics 169.2 (2005): 1061-1070 

      Reviewer #2 (Public Review): 

      Summary: 

      This theoretical paper examines genetic drift in scenarios deviating from the standard Wright-Fisher model. The authors discuss Haldane's branching process model, highlighting that the variance in reproductive success equates to genetic drift. By integrating the Wright-Fisher model with the Haldane model, the authors derive theoretical results that resolve paradoxes related to effective population size. 

      Strengths: 

      The most significant and compelling result from this paper is perhaps that the probability of fixing a new beneficial mutation is 2s/V(K). This is an intriguing and potentially generalizable discovery that could be applied to many different study systems. 

      The authors also made a lot of effort to connect theory with various real-world examples, such as genetic diversity in sex chromosomes and reproductive variance across different species. 

      Weaknesses: 

      One way to define effective population size is by the inverse of the coalescent rate. This is where the geometric mean of Ne comes from. If Ne is defined this way, many of the paradoxes mentioned seem to resolve naturally. If we take this approach, one could easily show that a large N population can still have a low coalescent rate depending on the reproduction model. However, the authors did not discuss Ne in light of the coalescent theory. This is surprising given that Eldon and Wakeley's 2006 paper is cited in the introduction, and the multiple mergers coalescent was introduced to explain the discrepancy between census size and effective population size, superspreaders, and reproduction variance - that said, there is no explicit discussion or introduction of the multiple mergers coalescent. 

      The Wright-Fisher model is often treated as a special case of the Cannings 1974 model, which incorporates the variance in reproductive success. This model should be discussed. It is unclear to me whether the results here have to be explained by the newly introduced WFH model, or could have been explained by the existing Cannings model. 

      The abstract makes it difficult to discern the main focus of the paper. It spends most of the space introducing "paradoxes". 

      The standard Wright-Fisher model makes several assumptions, including hermaphroditism, non-overlapping generations, random mating, and no selection. It will be more helpful to clarify which assumptions are being violated in each tested scenario, as V(K) is often not the only assumption being violated. For example, the logistic growth model assumes no cell death at the exponential growth phase, so it also violates the assumption about non-overlapping generations. 

      The theory and data regarding sex chromosomes do not align. The fact that \hat{alpha'} can be negative does not make sense. The authors claim that a negative \hat{alpha'} is equivalent to infinity, but why is that? It is also unclear how theta is defined. It seems to me that one should take the first principle approach e.g., define theta as pairwise genetic diversity, and start with deriving the expected pair-wise coalescence time under the MMC model, rather than starting with assuming theta = 4Neu. Overall, the theory in this section is not well supported by the data, and the explanation is insufficient. 

      {Alpha and alpha' can both be negative.  X^2 = 0.47 would yield x = -0.7}

      Reviewer #3 (Public Review): 

      Summary: 

      Ruan and colleagues consider a branching process model (in their terminology the "Haldane model") and the most basic Wright-Fisher model. They convincingly show that offspring distributions are usually non-Poissonian (as opposed to what's assumed in the Wright-Fisher model), and can depend on short-term ecological dynamics (e.g., variance in offspring number may be smaller during exponential growth). The authors discuss branching processes and the Wright-Fisher model in the context of 3 "paradoxes": (1) how Ne depends on N might depend on population dynamics; (2) how Ne is different on the X chromosome, the Y chromosome, and the autosomes, and these differences do match the expectations base on simple counts of the number of chromosomes in the populations; (3) how genetic drift interacts with selection. The authors provide some theoretical explanations for the role of variance in the offspring distribution in each of these three paradoxes. They also perform some experiments to directly measure the variance in offspring number, as well as perform some analyses of published data. 

      Strengths: 

      (1) The theoretical results are well-described and easy to follow. 

      (2) The analyses of different variances in offspring number (both experimentally and analyzing public data) are convincing that non-Poissonian offspring distributions are the norm. 

      (3) The point that this variance can change as the population size (or population dynamics) change is also very interesting and important to keep in mind. 

      (4) I enjoyed the Density-Dependent Haldane model. It was a nice example of the decoupling of census size and effective size. 

      Weaknesses: 

      (1) I am not convinced that these types of effects cannot just be absorbed into some time-varying Ne and still be well-modeled by the Wright-Fisher process. 

      (2) Along these lines, there is well-established literature showing that a broad class of processes (a large subset of Cannings' Exchangeable Models) converge to the Wright-Fisher diffusion, even those with non-Poissonian offspring distributions (e.g., Mohle and Sagitov 2001). E.g., equation (4) in Mohle and Sagitov 2001 shows that in such cases the "coalescent Ne" should be (N-1) / Var(K), essentially matching equation (3) in the present paper. 

      (3) Beyond this, I would imagine that branching processes with heavy-tailed offspring distributions could result in deviations that are not well captured by the authors' WFH model. In this case, the processes are known to converge (backward-in-time) to Lambda or Xi coalescents (e.g., Eldon and Wakely 2006 or again in Mohle and Sagitov 2001 and subsequent papers), which have well-defined forward-in-time processes. 

      (4) These results that Ne in the Wright-Fisher process might not be related to N in any straightforward (or even one-to-one) way are well-known (e.g., Neher and Hallatschek 2012; Spence, Kamm, and Song 2016; Matuszewski, Hildebrandt, Achaz, and Jensen 2018; Rice, Novembre, and Desai 2018; the work of Lounès Chikhi on how Ne can be affected by population structure; etc...) 

      (5) I was also missing some discussion of the relationship between the branching process and the Wright-Fisher model (or more generally Cannings' Exchangeable Models) when conditioning on the total population size. In particular, if the offspring distribution is Poisson, then conditioned on the total population size, the branching process is identical to the Wright-Fisher model. 

      (6) In the discussion, it is claimed that the last glacial maximum could have caused the bottleneck observed in human populations currently residing outside of Africa. Compelling evidence has been amassed that this bottleneck is due to serial founder events associated with the out-of-Africa migration (see e.g., Henn, Cavalli-Sforza, and Feldman 2012 for an older review - subsequent work has only strengthened this view). For me, a more compelling example of changes in carrying capacity would be the advent of agriculture ~11kya and other more recent technological advances. 

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      The reviewers recognize the value of this model and some of the findings, particularly results from the density-dependent Haldane model. However, they expressed considerable concerns with the model and overall framing of this manuscript.

      First, all reviewers pointed out that the manuscript does not sufficiently engage with the extensive literature on various models of effective population size and genetic drift, notably lacking discussion on Cannings models and related works.

      Second, there is a disproportionate discussion on the paradoxes, yet some of the paradoxes might already be resolved within current theoretical frameworks. All three reviewers found the modeling and simulation of the yeast growth experiment hard to follow or lacking justification for certain choices. The analysis approach of sex chromosomes is also questioned. 

      The reviewers recommend a more thorough review of relevant prior literature to better contextualize their findings. The authors need to clarify and/or modify their derivations and simulations of the yeast growth experiment to address the identified caveats and ensure robustness. Additionally, the empirical analysis of the sex chromosome should be revisited, considering alternative scenarios rather than relying solely on the MSE, which only provides a superficial solution. Furthermore, the manuscript's overall framing should be adjusted to emphasize the conclusions drawn from the WFH model, rather than focusing on the "unresolved paradoxes", as some of these may be more readily explained by existing frameworks. Please see the reviewers' overall assessment and specific comments. 

      Reviewer #2 (Recommendations For The Authors): 

      In the introduction -- "Genetic drift is simply V(K)" -- this is a very strong statement. You can say it is inversely proportional to V(K), but drift is often defined based on changes in allele frequency. 

      Page 3 line 86. "sexes is a sufficient explanation."--> "sex could be a sufficient explanation" 

      The strongest line of new results is about 2s/V(K). Perhaps, the paper could put more emphasis on this part and demonstrate the generality of this result with a different example. 

      The math notations in the supplement are not intuitive. e.g., using i_k and j_k as probabilities. I also recommend using E[X] and V[X]for expectation and variance rather than \italic{E(X)} to improve the readability of many equations. 

      Eq A6, A7, While I manage to follow, P_{10}(t) and P_{10} are not defined anywhere in the text. 

      Supplement page 7, the term "probability of fixation" is confusing in a branching model. 

      E.q. A 28. It is unclear eq. A.1 could be used here directly. Some justification would be nice. 

      Supplement page 17. "the biological meaning of negative..". There is no clear justification for this claim. As a reader, I don't have any intuition as to why that is the case.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment:

      Franke et al. explore and characterize the color response properties in the mouse primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The data is solid; however, the evidence supporting some conclusions is incomplete. In its current form, the paper makes a useful contribution to how color is coded in mouse V1. Significance would be enhanced with some additional analyses and a clearer discussion of the limitations of the data presented.

      We thank the reviewers for appreciating our manuscript. We have rewritten the conclusions of the paper to be more conservative and now more explicitly focus on color processing in mouse V1, rather than comparing V1 to the retina. Additionally, we discuss the limitations of our approach in detail in the Discussion section. Finally, we have addressed all comments from the reviewers below.

      Referee 1 (Remarks to the Author):

      In this study, Franke et al. explore and characterize color response properties across primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake 2P imaging to define the spectral response properties of visual interneurons in layer 2/3. They find that opponent responses are more pronounced at photopic light levels, and that diversity in color opponent responses exists across the visual field, with green ON/ UV OFF responses more strongly represented in the upper visual field. This is argued to be relevant for the detection of certain features that are more salient when using chromatic space, possibly due to noise reduction. In the revised version, Franke et al. have addressed the potential pitfalls in the discussion, which is an important point for the non-expert reader. Thus, this study provides a solid characterization of the color properties of V1 and is a valuable addition to visual neuroscience research.

      My remaining concerns are based more on the interpretation. I’m still not convinced by the statement "This type of color-opponency in the receptive field center of V1 neurons was not present in the receptive field center of retinal ganglion cells and, therefore, is likely computed by integrating center and surround information downstream of the retina." and I would suggest rewording it in the abstract.

      As discussed previously and now nicely added to the discussion, it is difficult to make a direct comparison given the different stimulus types used to characterize the retina and V1 recordings and the different levels of adaptation in both tissues. I will leave this point to the discussion, which allows for a more nuanced description of the phenomenon. Why do I think this is important? In the introduction, the authors argue that "the discrepancy [of previous studies] may be due to differences in stimulus design or light levels." However, while different light levels can be tested in V1, this cannot be done properly in the retina with 2P experiments. To address this, one would have to examine color-opponency in RGC terminals in vivo, which is beyond the scope of this study. Addressing these latter points directly in the discussion would, in my opinion, only strengthen the study.

      We thank the reviewer for the feedback. We removed the sentence mentioned by the reviewer from the abstract, as well as from the summary of our results in the Introduction. Additionally, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      Minor:

      In the abstract, the second sentence says that we already know the mechanisms in primates.

      Unfortunately, I do not think this is true. First, primates refers to an order with several species, which might have adaptations to their color-processing. Second, I’m aware of several characterizations in "primates" that have led to convincing models (as referenced), but in my opinion, this is far from a true understanding the mechanisms, especially since very little is known about foveal color processing due to the difficulties of these experiments. Similarly in the introduction. "Primates" is indirectly defined as a species. Perhaps some rewording is needed here as well, since we know how different cone distributions can be in rodents (see Peichl’s work).

      Thanks. We have reworded the Abstract and Introduction towards indicating that many studies have been performed in primate species, without suggesting that the mechanisms are described.

      The legend in Fig. 2 has a "Fig. ???"

      Fixed.

      Referee 2 (Remarks to the Author):

      Franke et al. characterize the representation of color in the primary visual cortex of mice, highlighting how this changes across the visual field. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet colors were presented in random combinations. Clustering of responses revealed a set of functional cell-types based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have different spatial distributions across V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths: The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      We thank the reviewer for appreciating our study.

      Weaknesses: While the study presents convincing evidence about the asymmetric distribution of color-opponent neurons in V1, the paper would greatly benefit from a more in-depth discussion of the caveats related to the conclusions drawn about their origin. This is particularly relevant regarding the conclusion drawn about the contribution of color opponent neurons in the retina. The mismatch between retinal color opponency and V1 color opponency could imply that this feature is not solely inherited from the retina, however, there are other plausible explanations that are not discussed here. Direct evidence for this statement remains weak.

      Thanks for this comment. We removed the retinal findings from the abstract, as well as from the summary of our results in the Introduction. In addition, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      In addition, the paper would benefit from adding explicit neuron counts or percentages to the quadrants of each of the density plots in Figures 2-5. The variance explained by the principal components does not capture the percentage of color opponent cells. Additionally, there appear to be some remaining errors in the figure legend and labels that have not been addressed (e.g. ’??’ in Fig 2 legend).

      Thank you for this suggestion. We believe that adding the numbers or percentages to the figure panels would make them too crowded. Instead, we have now mentioned in the Results section and the legends that the percentages of variance explained by the color (off-diagonal) and luminance axis (diagonal) correlate with the number of neurons located in the color (top left and bottom right) and luminance contrast quadrants (top right and bottom left), respectively. Together with the number of neurons in each plot stated in the legends and the scale bar indicating the number of neurons per gray level, we hope this approach provides clarity for the reader to interpret the panels. Additionally, we have fixed the broken reference in the legend of Fig. 2.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

      General Suggestions:

      -  Please add possible caveats of using ETA method to the discussion section. For example, it is unclear to what extent ON/OFF cells are being overlooked by using ETA method.

      We now discuss the limitations of the ETA approach in the Discussion section.

      - The caveats of using the percentage of variance explained in the retina as evidence against V1 solely inheriting color-opponency from retinal output neurons are not adequately addressed. For example, could the mismatch in explained variance of the color axis between V1 and RGCs be explained by a subset of non-color opponent RGCs projecting elsewhere (not dLGN-V1) or that color opponent cells project to a larger number of neurons in V1 than non-color opponent cells? We suggest adding a paragraph to the discussion to address this issue.

      We have removed these conclusions from the paper, more carefully interpret the retinal results and mention that comparing ex-vivo retina data with in-vivo cortical data is challenging.

      - Please clarify how the different response types shown in Figure 5e-f lead to differences in noise detection and thereby differences in predator discriminability. For example, why does Gon/UVoff not respond to the noise scene while Goff/UVoff does?

      We added this to the Results section.

      - Please clarify the relationship between ETA amplitude, neural response probability, and neural response amplitude. For example, do color-opponent cells have equal absolute neural response amplitudes to the different colors?

      Thank you for bringing up this point. The ETA is obtained by summing the stimulus sequences that elicit an event (i.e., response), weighted by the amplitude of the response. Consequently, the absolute amplitude of the ETA correlates with the calcium amplitude. Importantly, the ETA amplitudes of different stimulus conditions are comparable because they were estimated on the same normalized calcium trace. Therefore, comparing the absolute amplitudes of ETAs of color-opponent neurons reveals the response magnitude of the cells to different colors. We have now included this information in the Results section.

      Abstract: - "more than a third of neurons in mouse V1 are color-opponent in their receptive field center". It is unclear what data supports this statement. Can you please provide a statement in the manuscript that supports this directly using the number of neurons?

      We added the following sentence to the Results section: Nevertheless, a substantial fraction of neurons (33.1%) preferred color-opponent stimuli and scattered along the off-diagonal in the upper left and lower right quadrants, especially for the RF center.

      Figure 2: - There is a ?? in the figure legend. Which figure should this refer to? - please provide explicit neuron counts/percentages for each quadrant in b.

      We fixed the figure reference. We believe that adding the numbers or percentages to the figure panels would make them too crowded. Instead, we have now mentioned in the Results section and the legends that the percentages of variance explained by the color (off-diagonal) and luminance axis (diagonal) correlate with the number of neurons located in the color (top left and bottom right) and luminance contrast quadrants (top right and bottom left), respectively. Together with the number of neurons in each plot stated in the legends and the scale bar indicating the number of neurons per gray level, we hope this approach provides clarity for the reader to interpret the panels.

      Figure 3: - Fig 3: Color scheme makes it very difficult to differentiate the different conditions, especially when printed.

      Thanks we changed the color scheme.

      - Add explicit neuron counts/percentages for each quadrant in b.

      See above.

      Figure 4: - Add explicit neuron counts/percentages for each quadrant in b.

      See above.

      Figure 5: - Add explicit neuron counts/percentages for each quadrant in c.

      See above.

      Methods: - "we modeled each response type to have a square RF with 10 degrees visual angle in diameter". There appears to be a mismatch between this statement and Figure 5e where 18 degrees is reported.

      Thanks we fixed that.

      Referee 3 (Remarks to the Author):

      This paper studies chromatic coding in mouse primary visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups. The results are interesting and many aspects of the experiments and conclusions are well done; several technical concerns, however, limit the support for several main conclusions,

      Limitations of stimulus choice The paper relies on responses to a large (37.5 degree diameter) modulated spot and surround region. This spot is considerably larger than the receptive fields of both V1 cells and retinal ganglion cells (it is twice the area of the average V1 receptive field). As a result, the spot itself is very likely to strongly activate both center and surround mechanisms, and responses of cells are likely to depend on where the receptive fields are located within the spot

      (and, e.g., how much of the true neural surround samples the center spot vs the surround region). Most importantly, the surrounds of most of the recorded cells will be strongly activated by the central spot. This brings into question statements in the paper about selective activation of center and surround (e.g. page 2, right column). This in turn raises questions about several subsequent analyses that rely on selective center and surround activation.

      Thank you for this comment. A similar point was raised by a reviewer in the first round of revision. We agree with the reviewers that it is critical to discuss both the rationale behind our stimulus design and its limitations to facilitate better interpretation by the reader.

      To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons (between 20 - 30 degrees visual angle depending on the stimulus, see here). The disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we used the following steps: { For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      Together, we believe these points strongly suggest that the center spot and the surround annulus of the noise stimulus predominantly drive center (i.e. classical RF) and surround (i.e. extraclassical RF), respectively, of the recorded V1 neurons. This is further supported by the fact that color response types identified using an automated clustering method were robust across mice (Suppl. Fig. 6c), indicating consistent stimulus centering.

      Nevertheless, we cannot exclude the possibility that the stimulus was misaligned for a subset of the recorded neurons used in our analysis. We agree with the reviewer that such misalignment might have caused the center stimulus to partially activate the surround. To further address this issue beyond the controls we have already implemented, one could compare the results of our approach with an approach that centers the stimulus on individual neurons. However, we believe that performing these additional experiments is beyond the scope of the current study.

      To acknowledge the experimental limitations of our study and the concerns brought up by the reviewer, we have added the steps we perform to reduce the effects of stimulus misalignment in the Results section and discuss the problem of stimulus alignment in the Discussion in a separate section. With this, we believe our manuscript explains both the rationale behind our stimulus design as well as important limitations of the approach.

      Comparison with retina A key conclusion of the paper is that the chromatic tuning in V1 is not inherited from retinal ganglion cells. This conclusion comes from comparing chromatic tuning in a previously-collected data set from retina with the present results. But the retina recordings were made using a considerably smaller spot, and hence it is not clear that the comparison made in the paper is accurate. For example, the stimulus used for the V1 experiments almost certainly strongly stimulates both center and surround of retinal ganglion cells. The text focuses on color opponency in the receptive field centers of retinal ganglion cells, but center-surround opponency seems at least as relevant for such large spots. This issue needs to be described more clearly and earlier in the paper.

      Thanks for this comment. We removed the retinal findings from the abstract, as well as from the summary of our results in the Introduction. In addition, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      Limitations associated with ETA analysis One of the reviewers in the previous round of reviews raised the concern that the ETA analysis may not accurately capture responses of cells with nonlinear receptive field properties such as On/Off cells. This possibility and whether it is a concern should be discussed.

      Thanks for this comment. We now discuss the limitation of using an ETA analysis in the

      Discussion section.

      Discrimination performance poor Discriminability of color or luminance is used as a measure of population coding. The discrimination performance appears to be quite poor - with 500-1000 neurons needed to reliably distinguish light from dark or green from UV. Intuitively I would expect that a single cell would provide such discrimination. Is this intuition wrong? If not, how do we interpret the discrimination analyses?

      Thank you for raising this point. The plots in Fig. 2c (and Figs. 3-5) show discriminability in bits, with the discrimination accuracy in % highlighted by the dotted horizontal lines. For 500 neurons, the discriminability is approx. 0.8 bits, corresponding to 95% accuracy. Even for 50 neurons, the accuracy is significantly above chance level. We now mention in the legends that the dotted lines indicate decoding accuracy in %.

    1. Author response:

      The following is the authors’ response to the current reviews.

      (1) Though we cannot survey all mutants, our observation that 774 genetically diverse adaptive mutants converge at the level of phenotype is important. It adds to growing evidence (see PMID33263280, PMID37437111, PMID22282810, PMID25806684) that the genetic basis of adaptation is not as diverse as the phenotypic basis. This convergence could make evolution more predictable.

      (2) Previous fitness competitions using this specific barcode system have been run for greater than 25 generations (PMID33263280, PMID27594428, PMID37861305, PMID27594428). We measure fitness per cycle, rather than per generation, so our fitness advantages are comparable to those in the aforementioned studies, including Venkataram and Dunn et al. (PMID27594428).

      (3) Our results remain the same upon removing the ~150 lineages with the noisiest fitness inferences, including those the reviewer mentions (see Figure S7).

      (4) We agree that there are likely more than the 6 clusters that we validated with follow-up studies (see Discussion). The important point is that we see a great deal of convergence in the behavior of diverse adaptive mutants.

      (5) The growth curves requested by the reviewer were included in our original manuscript; several more were added in the revision (see Figures 5D, 5E, 7D, S11B, S11C).


      The following is the authors’ response to the original reviews.

      Public Reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.  

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation. 

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures.  Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study. 

      Weaknesses: 

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements. 

      This is a misunderstanding that we clarified in this revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons. 

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and perhaps more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we explicitly stated that these 21,000 isolated lineages do not all represent unique, adaptive lineages. We changed the word “lineages” to “isolates” where relevant in Figure 2 and the accompanying legend. And we have added the following sentence to the figure 2 legend (line 212), “These ~21,000 isolates do not represent as many unique, adaptive lineages because many either have the same barcode or do not possess adaptive mutations.”

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Most of these studies survey fewer than 774 mutants. Further, our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 176 - 178).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance. 

      We now devote 19 lines of text to discussing this bias (on lines 160 - 162, 278-284, and in more detail on 758 - 767).

      We walk through an example of a class of mutants that our study misses. One lines 759 - 763, we say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we added more text earlier in the manuscript that explicitly discusses this bias. Lines 278 – 283 now read, “The 774 lineages we focus on are biased towards those that are reproducibly adaptive in multiple environments we study. This is because lineages that have low fitness in a particular environment are rarely observed >500 times in that environment (Figure S4). By requiring lineages to have high-coverage fitness measurements in all 12 conditions, we may be excluding adaptive mutants that have severe tradeoffs in one or more environments, consequently blinding ourselves to mutants that act via unique underlying mechanisms.”

      Note that while we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs. 

      We agree and discussed exactly the reviewer’s point about our inclusion threshold in the 19 lines of text mentioned previously (lines 160 - 162, 278-284, and 758 - 767). To add to this discussion, and avoid the misunderstanding the reviewer mentions, we added the following strongly-worded sentence to the end of the paragraph on lines 749 – 767 in our revised manuscript: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”. 

      More generally speaking, we set up our study around Figure 1, which depicts a treatment strategy that works best if there exists but a single type of adaptive mutant. Despite our inclusion threshold, we find there are at least 6 types of mutants. This diminishes hopes of designing simple multidrug strategies like Figure 1. Our goal is to present a tempered and nuanced discussion of whether and how to move forward with designing multidrug strategies, given our observations. On one hand, we point out how the phenotypic convergence we observe is promising. But on the other hand, we also point out how there may be less convergence than meets the eye for various reasons including the inclusion threshold the reviewer mentions (lines 749 - 767).

      We have made several minor edits to the text with the goal of providing a more balanced discussion of both sides. For example, we added the words, “may yet” to the following sentences on lines 32 – 36 of the abstract: “These findings, on one hand, demonstrate the difficulty in relying on consistent or intuitive tradeoffs when designing multidrug treatments. On the other hand, by demonstrating that hundreds of adaptive mutations can be reduced to a few groups with characteristic tradeoffs, our findings may yet empower multidrug strategies that leverage tradeoffs to combat resistance.”

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations. 

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult. 

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system). 

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay. 

      Previous work has demonstrated that in this evolution platform, most mutations occur during the transformation that introduces the DNA barcodes (Levy et al. 2015). In other words, these mutations are already present and do not accumulate during the 40 generations of evolution. Therefore, the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      We have added the following sentence to the main text to highlight this issue (lines 247 - 249): “This happens because the barcoding process is slightly mutagenic, thus there is less need to wait for DNA replication errors to introduce mutations (Levy et al. 2015; Venkataram et al. 2016).

      We also elaborate on this in the method section entitled, “Performing barcoded fitness competition experiments,” where we added a full paragraph to clarify this issue (lines 972 - 980).

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.  Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages. This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted. 

      Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing this noise (Figure S7B).

      More importantly, we devoted 4 figures and 200 lines of text to demonstrating that the clusters we identified capture biologically meaningful differences between mutants (and not noise). We have modified the main text to point readers to figures 5 through 8 earlier, such that it is more apparent that the clustering analysis is just the first piece of our data demonstrating convergence at the level of phenotype.

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components. 

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent an intuitive phenotype, like resistance to fluconazole.  Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods. 

      Still, we agree that confirming our clusters are robust to different clustering methods is helpful. We have included PCA in the revised manuscript, plotting PC1 vs PC2 as Figure S9 with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages. 

      We worry that the idea stems from apriori notions of what the important dimensions should be. The biology of our system is unfortunately not intuitive. For example, it seems like this idea would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole. 

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered. 

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. Another factor we considered were follow-up genotyping and phenotyping studies that confirm biologically meaningful differences between the mutants in each cluster (Figures 5 – 8). We now state this explicitly. Here is the modified paragraph where we describe how we chose a model with 7 clusters, from lines 436 – 446 of the revised manuscript:

      “Beyond the obvious divide between the top and bottom clusters of mutants on the UMAP, we used a gaussian mixture model (GMM) (Fraley and Raftery, 2003) to identify clusters. A common problem in this type of analysis is the risk of dividing the data into clusters based on variation that represents measurement noise rather than reproducible differences between mutants (Mirkin, 2011; Zhao et al., 2008). One way we avoided this was by using a GMM quality control metric (BIC score) to establish how splitting out additional clusters affected model performance (Figure S6). Another factor we considered were follow-up genotyping and phenotyping studies that demonstrate biologically meaningful differences between mutants in different clusters (Figures 5 – 8). Using this information, we identified seven clusters of distinct mutants, including one pertaining to the control strains, and six others pertaining to presumed different classes of adaptive mutant (Figure 4D). It is possible that there exist additional clusters, beyond those we are able to tease apart in this study.”

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset. 

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e.

      merge) when we removed noise suggests these clusters were not capturing noise. 

      Most importantly, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).  

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays. 

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously, since we found this to be effective in our previous work (PMID37237236). 

      Perhaps also relevant is that the main assay we use to measure fitness has been previously validated (PMID27594428) and no subsequent study using this assay validates using the methods suggested above (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203). Similarly, bar-seq has been used, without the suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate. 

      For all of these reasons above, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field. However, please see below.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors. 

      While we don’t agree that fitness measurements obtained from this bar-seq assay generally require validation, we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways.

      Our manuscript has 4 figures (5 - 8) and over 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. In the revised manuscript, we added additional validation experiments, such that three figures (Figures 5, 7 and S11) now involve growth curves, as the reviewer requested. 

      Below, we walk through the different types of validation experiments that are present in our manuscript, including those that were added in this revision.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the relevant double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S11), finding that mutants from different clusters have different growth curves. In the revised manuscript, we added growth curves for 6 additional mutants (3 from cluster 1 and 3 from cluster 3), demonstrating that only the cluster 1 mutants have a tradeoff in high concentrations of fluconazole (see Figure 5D & 5E). In sum, this work demonstrates that mutants from different clusters have predictable differences in their growth phenotypes.

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. They often do (see pie charts in Figures 5, 6, 7, 8). In the revised manuscript, we extended this analysis to include mutants from cluster 1. Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole. In our revised manuscript, we show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see pie chart in new Figure 5A). No other cluster’s evolutionary history shows this pattern (compare to pie charts in figures 6, 7, and 8).

      **These pie charts also provide independent confirmation supporting the fitness tradeoffs observed for each cluster in figure 4E. For example, mutants in cluster 5 appear to have a tradeoff in a particular double drug condition (HRLF), and the pie charts confirm that they rarely originate from that evolution condition. This differs from cluster 4 mutants, which do not have a fitness tradeoff in HRLF, and are more likely to originate from that environment (see purple pie slice in figure 7). Additional cases where results of evolution experiments (pie charts) confirm observed fitness tradeoffs are discussed in the manuscript on lines 320 – 326, 594 – 598, 681 – 685.

      (3) Mutants from each cluster often fall into different genes: We sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6). 

      (4) Mutants from each cluster have behaviors previously observed in the literature: We compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 485 - 491). Previous work suggests that some mutations to PDR have different tradeoffs than others, which corresponds to our finding that PDR mutants fall into two separate clusters (lines 610 - 612). IRA1 mutants were previously observed to have high fitness in our “no drug” condition and are found in the cluster that has the highest fitness in the “no drug” condition (lines 691 - 696). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 702 - 704).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods:  In our original manuscript, we performed various different re-clustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S10). The clusters of mutants that we observe in figure 4 do not change substantially when we re-cluster the data. In our revised manuscript, we added another clustering method: principal component analysis (PCA) (Fig S9).  Again, we found that our clusters are largely preserved.

      While these experiments demonstrate meaningful differences between the mutants in each cluster, important questions remain. For example, a long-standing question in biology centers on the extent to which every mutation has unique phenotypic effects versus the extent to which scientists can predict the effects of some mutations from other similar mutations. Additional studies on the clusters of mutants discovered here will be useful in deepening our understanding of this topic and more generally of the degree of pleiotropy in the genotype-phenotype map.

      Reviewer #2 (Public Review): 

      Summary: 

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping. 

      Strengths: 

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory). 

      We are grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.  

      Weaknesses: 

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one! 

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think: 

      We have expanded the introduction, in particular lines 129 – 157 of the revised manuscript, to walk readers through the connection between fitness tradeoffs and molecular mechanisms. For example, here is one relevant section of new text from lines 131 - 136: “The intuition here is as follows. If two groups of drug resistant mutants have different fitness tradeoffs, it could mean that they provide resistance through different underlying mechanisms. Alternatively, both could provide drug resistance via the same mechanism, but some mutations might also affect fitness via additional mechanisms (i.e. they might have unique “side-effects” at the molecular level) resulting in unique fitness tradeoffs in some environments.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm. 

      We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). For example, we are interested in whether diverse mutations converge at the level of phenotype and fitness. Figure 1A depicts a scenario with a lot of convergence in that all adaptive mutations have the same fitness tradeoffs.

      The reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the structure of the genotype-phenotype-fitness map apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So, we cited papers from across the tree of life to support this sentence.  And in the next sentence, where we cite 3 papers focusing solely on fungal research, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should also apply broadly, beyond yeast.

      On the other hand, because we study drug resistant mutations, we hope that our dataset and observations are of use to scientists studying the evolution of resistance. We use our introduction to explain how the structure of the genotype-phenotype-fitness map might influence whether a multidrug strategy is successful (Figure 1).

      We are hesitant to rework our introduction to focus more specifically on fungal infections as this is not our primary area of expertise.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae). 

      In the revised manuscript, we have edited several lines (line 95, 186, 822) to state the organism this work was done with is Saccharomyces cerevisiae. 

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly? 

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance.

      Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper. 

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections? 

      Perhaps because our background lies in general study of the genotype-phenotype map, we are hesitant about making bold assertions about how our work might apply to pathogenic yeasts. We are hopeful that our work will serve as a stepping-stone such that scientists from that community can perhaps make (and test) such statements.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I found the ideas and the questions asked in this manuscript to be interesting and ambitious. The setup of the evolution and fitness competition experiments was well poised to answer them, but the analysis of the data is not currently enough to properly support the claims made. I would suggest revising the analysis to address the weaknesses raised in the public review and if possible, adding some more experimental validations. As you already have genome sequencing data showing the causal mutation for many mutants across the different clusters, it should be possible for you to reconstruct some of the strains and test validate their phenotypes and cluster identity. 

      Yes, this is possible. We added more validation experiments (see figure 5). We already had quite a few validation experiments (figures 5 - 8 and lines 479 - 718), but we did not clearly highlight the significance of these analyses in our original manuscript. Therefore, we modified the text in our revised manuscript in various places to do so. For example, we now make clearer that we jointly use BIC scores as well as validation experiments to decide how many clusters to describe (lines 436 - 446). We also make clearer that our clustering analysis is only the first step towards identifying groups of mutants with similar tradeoffs by using words and phrases like, “we start by” (line 411) and “preliminarily” (line 448) when discussing the clustering analysis.  We also point readers to all the figures describing our validation experiments earlier (line 443), and list these experiments out in the discussion (lines 738 - 741).

      Also, please deposit your genome sequencing data in a public database (I am not sure I saw it mentioned anywhere). 

      We have updated line 1088 of the methods section to include this sentence: “Whole genome sequences were deposited in GenBank under SRA reference PRJNA1023288.”

      Reviewer #2 (Recommendations For The Authors):

      I don't think the figures or experiments can be improved upon, they are excellent. There are a few times I feel things are written in a rather confusing way and could be explained better, but also I feel there are places the authors jump from one thing to another really quickly and the reader (who might not be an expert in this area) will struggle to keep up. For example: 

      Explaining what RAD is - it is introduced in the methods, but what it is, is not really explained. 

      Since the introduction is already very long, we chose not to explain radicicol’s mechanism of action here. Instead, we bring this up later on lines 614 – 621 when it becomes relevant.

      More generally, in response to this advice and that from reviewer 1, we also added text to various places in the manuscript to help explain our work more clearly. In particular, we clarified the significance of our validation experiments and various important methodological details (see above). We also better explained the connection between fitness tradeoffs and mechanisms (see above) and added more details about the potential use cases of our approach (lines 142 – 150).

      The abstract states "some of the groupings we find are surprising. For example, we find some mutants that resist single drugs do not resist their combination, and some mutants to the same gene have different tradeoffs than others". Firstly, this sentence is a bit confusing to read but if I've read it as intended, then is it really surprising? It's difficult for organisms (bacteria and fungi) to develop multiple beneficial mutations conferring drug resistance on the same background, hence why combination antifungal drug therapy is often used to treat infections. 

      This is a place where brevity got in the way of clarity. We added a bit of text to make clear why we were surprised. Specifically, we were surprised because not all mutants behave the same. Some resist single drugs AND their combination. Some resist single drugs but not their combination. The sentence in the abstract now reads, “For example, we find some mutants that resist single drugs do not resist their combination, while others do. And some mutants to the same gene have different tradeoffs than others.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Responses to recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors):

      The manuscript would be strengthened with the following key revisions mostly having to do with image quality: 

      (1) It is very difficult in Figure 4B to see which nuclei actually have evidence of mitochondrial transcripts. It might be helpful to provide arrows to specific cells and also to provide some estimate of the percentage of cells with nuclear mt-transcripts as measured by ISH compared to the 3-6% of cortex cell estimate seen in the snRNAseq analysis. 

      As suggested, now we have added arrows to help readers to see the signals in nuclei. The detection threshold of ISH and single-nucleus RNA-seq should be different, and therefore, measuring estimates of PT-Mito by ISH would not be reliable.

      (2) The phospho-PKR images provided as evidence of C16 activity (Supplemental Figure 1) are too dim to be very useful. Could brighter images be provided? 

      We have now adjusted the LUTs of images in Supplemental Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study is convincing because they performed time-resolved X-ray crystallography under different pH conditions using active/inactive metal ions and PpoI mutants, as with the activity measurements in solution in conventional enzymatic studies. Although the reaction mechanism is simple and may be a little predictable, the strength of this study is that they were able to validate that PpoI catalyzes DNA hydrolysis through "a single divalent cation" because time-resolved X-ray study often observes transient metal ions which are important for catalysis but are not predictable in previous studies with static structures such as enzyme-substrate analog-metal ion complexes. The discussion of this study is well supported by their data. This study visualized the catalytic process and mutational effects on catalysis, providing new insight into the catalytic mechanism of I-PpoI through a single divalent cation. The authors found that His98, a candidate of proton acceptor in the previous experiments, also affects the Mg2+ binding for catalysis without the direct interaction between His98 and the Mg2+ ion, suggesting that "Without a proper proton acceptor, the metal ion may be prone for dissociation without the reaction proceeding, and thus stable Mg2+ binding was not observed in crystallo without His98". In future, this interesting feature observed in I-PpoI should be investigated by biochemical, structural, and computational analyses using other metal-ion dependent nucleases. 

      We appreciate the reviewer for the positive assessment as well as all the comments and suggestions.

      Reviewer #2 (Public Review): 

      Summary: 

      Most polymerases and nucleases use two or three divalent metal ions in their catalytic functions. The family of His-Me nucleases, however, use only one divalent metal ion, along with a conserved histidine, to catalyze DNA hydrolysis. The mechanism has been studied previously but, according to the authors, it remained unclear. By use of a time resolved X-ray crystallography, this work convincingly demonstrated that only one M2+ ion is involved in the catalysis of the His-Me I-PpoI 19 nuclease, and proposed concerted functions of the metal and the histidine. 

      Strengths: 

      This work performs mechanistic studies, including the number and roles of metal ion, pH dependence, and activation mechanism, all by structural analyses, coupled with some kinetics and mutagenesis. Overall, it is a highly rigorous work. This approach was first developed in Science (2016) for a DNA polymerase, in which Yang Cao was the first author. It has subsequently been applied to just 5 to 10 enzymes by different labs, mainly to clarify two versus three metal ion mechanisms. The present study is the first one to demonstrate a single metal ion mechanism by this approach. 

      Furthermore, on the basis of the quantitative correlation between the fraction of metal ion binding and the formation of product, as well as the pH dependence, and the data from site-specific mutants, the authors concluded that the functions of Mg2+ and His are a concerted process. A detailed mechanism is proposed in Figure 6. 

      Even though there are no major surprises in the results and conclusions, the time-resolved structural approach and the overall quality of the results represent a significant step forward for the Me-His family of nucleases. In addition, since the mechanism is unique among different classes of nucleases and polymerases, the work should be of interest to readers in DNA enzymology, or even mechanistic enzymology in general. 

      Thank you very much for your comments and suggestions.

      Weaknesses: 

      Two relatively minor issues are raised here for consideration: 

      p. 4, last para, lines 1-2: "we next visualized the entire reaction process by soaking I-PpoI crystals in buffer....". This is a little over-stated. The structures being observed are not reaction intermediates. They are mixtures of substrates and products in the enzyme-bound state. The progress of the reaction is limited by the progress of the soaking of the metal ion. Crystallography has just been used as a tool to monitor the reaction (and provide structural information about the product). It would be more accurate to say that "we next monitored the reaction progress by soaking....". 

      We appreciate the clarification regarding the description of our experimental approach. We agree that our structures do not represent reaction intermediates but rather mixtures of substrate and product states within the enzyme-bound environment. We have revised the text accordingly to more accurately reflect our methodology.

      p. 5, the beginning of the section. The authors on one hand emphasized the quantitative correlation between Mg ion density and the product density. On the other hand, they raised the uncertainty in the quantitation of Mg2+ density versus Na+ density, thus they repeated the study with Mn2+ which has distinct anomalous signals. This is a very good approach. However, there is still no metal ion density shown in the key Figure 2A. It will be clearer to show the progress of metal ion density in a figure (in addition to just plots), whether it is Mg or Mn. 

      Thank you for your insightful comments. We recognize the importance of visualizing metal ion density alongside product density data. To address this, we included in Figure S4 to present Mg2+/Mn2+ and product densities concurrently.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 6. I understand that pre-reaction state (left panel) and Metal-binding state (two middle panels) are in equilibrium. But can we state that the Metal-binding state (two middle panels) and the product state (right panel) are in equilibrium and connected by two arrows? 

      Thank you for your comments. We agree that the DNA hydrolysis reaction process may not be reversible within I-Ppo1 active site. To clarify, we removed the backward arrows between the metal-binding state and product state. In addition, we thank the reviewer for giving a name for the middle state and think it would be better to label the middle state. We added the metal-binding state label in the revised Figure 6 and also added “on the other hand, optimal alignment of a deprotonated water and Mg2+ within the active site, labeled as metal-binding state, leads to irreversible bond breakage (Fig. 6a)” within the text.

      (2) The section on DNA hydrolysis assay (Materials and Methods) is not well described. In this section, the authors should summarize the methods for the experiments in Figure 4 AC, Figure 5BC, Figure S3C, Figure S4EF, and Figure S6AB. The authors presented some graphs for the reactions. For clarity, the author should state in the legends which experiments the results are from (in crystallo or in solution). Please check and modify them. 

      Thank you for the suggestion. We have added four paragraphs to detail the experimental procedures for experiments in these figures. In addition, we have checked all of the figure legends and labeled them as “in crystallo or in solution.” To clarify, we also added “in crystallo” or “solution” in the corresponding panels.

      (3) The authors showed the anomalous signals of Mn2+ and Tl+. The authors should mention which wavelength of X-rays was used in the data collections to calculate the anomalous signals. 

      Thank you for the suggestion. We have included the wavelength of the X-ray in the figure legends that include anomalous maps, which were all determined at an X-ray wavelength of 0.9765 Å.

      (4) The full names of "His-Me" and "HNH" are necessary for a wide range of readers. 

      Thank you for the suggestion. We have included the full nomenclature for His-Me (histidine-metal) nucleases and HNH (histidine-asparagine-histidine) nuclease.

      (5) The authors should add the side chain of Arg61 in Figure 1E because it is mentioned in the main text. 

      Thank you for the suggestion. We have added Arg61 to Figure 1E.

      (6) Figure 5D. For clarity, the electron densities should cover the Na+ ion. The same request applies to WatN in Figure S3B.

      Thank you for catching this detail. We have added the electron density for the Na+ ion in Figure 5D and WatN in Figure S3B.

      (7) At line 269 on page 8, what is "previous H98A I-PpoI structure with Mn2+"? Is the structure 1CYQ? If so, it is a complex with Mg2+. 

      Thank you for catching this detail. We have edited the text to “previous H98A I-PpoI structure with Mg2+.”

      (8) At line 294 on page 9, "and substrate alignment or rotation in MutT (66)." I think "alignment of the substrate and nucleophilic water" is preferred rather than "substrate alignment or rotation". 

      Thank you for the suggestion. We have edited the text to “alignment of the substrate and nucleophilic water.”

      (9) At line 305 on page 9, "Second, (58, 69-71) single metal ion binding is strictly correlated with product formation in all conditions, at different pH and with different mutants (Figure 3a and Supplementary Figure 4a-c) (58)". The references should be cited in the correct positions. 

      Thank you for catching this typo. We have removed the references.

      (10) At line 347 on page 10, "Grown in a buffer that contained (50 g/L glucose, 200 g/L α-lactose, 10% glycerol) for 24 hrs." Is this sentence correct? 

      Thank you for catching this detail. We have corrected the sentence.

      (11) At line 395 on page 11, "The His98Ala I-PpoI crystals of first transferred and incubated in a pre-reaction buffer containing 0.1M MES (pH 6.0), 0.2 M NaCl, 1 mM MgCl2 or MnCl2, and 20% (w/v) PEG3350 for 30 min." In the experiments using this mutant, does a pre-reaction buffer contain MgCl2 or MnCl2? 

      Thank you for bringing this to our attention. We have performed two sets of experiments: 1) metal ion soaking in 1 mM Mn2+, which is performed similarly as WT and does not have Mn2+ in the pre-reaction buffer; 2) imidazole soaking, 1 mM Mn2+ was included in the pre-reaction buffer. We reasoned that the Mn2+ will not bind or promote reaction with His98Ala I-PpoI, but pre-incubation may help populate Mn2+ within the lattice for better imidazole binding. However, neither Mn2+ nor imidazole were observed. We have added experimental details for both experiments with His98Ala I-PpoI.

      (12) In the figure legends of Figure 1, is the Fo-Fc omit map shown in yellow not in green? Please remove (F) in the legends. 

      We have changed the Fo-Fc map to be shown in violet. We have also removed (f) from the figure legends.

      (13) I found descriptions of "MgCl". Please modify them to "MgCl2". 

      Thank you for catching these details. We have modified all “MgCl” to “MgCl2.”

      (14) References 72 and 73 are duplicated. 

      We have removed the duplicated reference.

      Reviewer #2 (Recommendations For The Authors): 

      p. 9, first paragraph, last three lines: "Thus, we suspect that the metal ion may play a crucial role in the chemistry step to stabilize the transition state and reduce the electronegative buildup of DNA, similar to the third metal ion in DNA polymerases and RNaseH." This point is significant but the statement seems a little uncertain. You are saying that the single metal plays the role of two metals in polymerase, in both the ground state and the transition state. I believe the sentence can be stronger and more explicit. 

      Thank you for raising this point. We suspect the single metal ion in I-PpoI is different from the A-site or B-site metal ion in DNA polymerases and RNaseH, but similar to the third metal ion in DNA polymerases and nucleases. As we stated in the text,

      (1) the metal ion in I-PpoI is not required for substrate alignment. The water molecule and substrate can be observed in place even in the presence of the metal ion. In contrast, the A-site or B-site metal ion in DNA polymerases and RNaseH are required for aligning the substrates.

      (2) Moreover, the appearance of the metal ion is strictly correlated with product formation, similar as the third metal ion in DNA polymerase and RNaseH.

      To emphasize our point, we have revised the sentence as

      “Thus, similar to the third metal ion in DNA polymerases and RNaseH, the metal ion in I-PpoI is not required for substrate alignment but is essential for catalysis. We suspect that the single metal ion helps stabilize the transition state and reduce the electronegative buildup of DNA, thereby promoting DNA hydrolysis.”

      Minor typos: 

      p. 2, line 4 from bottom: due to the relatively low resolution... 

      Thank you for catching this. We have edited the text to “due to the relatively low resolution.”

      Figure 4F: What is represented by the pink color? 

      The structures are color-coded as 320 s at pH 6 (violet), 160 s at pH 7 (yellow), and 20 s at pH 8 (green). We have included the color information in figure legend and make the labeling clearer in the panel.

      p. 9, first paragraph, last line: ...similar to the third... 

      Thank you for catching this. We have edited the text.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The study answers the important question of whether the conformational dynamics of proteins are slaved by the motion of solvent water or are intrinsic to the polypeptide. The results from neutron scattering experiments, involving isotopic labelling, carried out on a set of four structurally different proteins are convincing, showing that protein motions are not coupled to the solvent. A strength of this work is the study of a set of proteins using spectroscopy covering a range of resolutions. A minor weakness is the limited description of computational methods and analysis of data. The work is of broad interest to researchers in the fields of protein biophysics and biochemistry.

      We thank the editors and reviewers for the positive and encouraging comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zheng et al. study the 'glass' transitions that occurs in proteins at ca. 200K using neutron diffraction and differential isotopic labeling (hydrogen/deuterium) of the protein and solvent. To overcome limitations in previous studies, this work is conducted in parallel with 4 proteins (myoglobin, cytochrome P450, lysozyme and green fluorescent protein) and experiments were performed at a range of instrument time resolutions (1ns - 10ps). The author's data looks compelling, and suggests that transitions in the protein and solvent behavior are not coupled and contrary to some previous reports, the apparent water transition temperature is a 'resolution effect'; i.e. instrument response limited. This is likely to be important in the field, as a reassessment of solvent 'slaving' and the role of the hydration shell on protein dynamics should be reassessed in light of these findings.

      Strengths:

      The use of multiple proteins and instruments with a rate of energy resolution/ timescales.

      We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The paper could be organised to better allow the comparison of the complete dataset collected. The extent of hydration clearly influences the protein transition temperature. The authors suggest that "water can be considered here as lubricant or plasticizer which facilitates the motion of the biomolecule." This may be the case, but the extent of hydration may also alter the protein structure.

      Following the reviewer’s suggestion, we studied the secondary structure content and tertiary structure of CYP protein at different hydration levels (h = 0.2 and 0.4) through molecular dynamics simulation. As shown in Table S2 and Fig. S6, the extent of hydration does not alter the protein secondary structure content and overall packing. Thus, this result also suggests that water molecules have more influence on protein dynamics than on protein structure. We added the above results in the revised SI.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "Decoupling of the Onset of Anharmonicity between a Protein and Its Surface Water around 200 K" by Zheng et al. presents a neutron scattering study trying to elucidate if at the dynamical transition temperature water and protein motions are coupled. The origin of the dynamical transition temperature is highly debated since decades and specifically its relation to hydration.

      Strengths:

      The study is rather well conducted, with a lot of efforts to acquire the perdeuterated proteins, and some results are interesting.

      We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The MD data presented appears to be missing description of the methods used.

      If these data support the authors claim that different levels of hydration do not affect the protein structure, careful analysis of the MD simulation data should be presented that show the systems are properly equilibrated under each condition. Additionally, methods are needed to describe the MD parameters and methods used, and for how long the simulations were run.

      We have now added the methods of MD simulation into the revised SI.

      “The initial structure of protein cytochrome P450 (CYP) for simulations was taken from PDB crystal structure (2ZAX). Two protein monomers were filled in a cubic box. 1013 and 2025 water molecules were inserted into the box randomly to reach a mass ratio of 0.2 and 0.4 gram water/1 gram protein, respectively, which mimics the experimental condition. Then 34 sodium counter ions were added to keep the system neutral in charge. The CHARMM 27 force field in the GROMACS package was used for CYP, whereas the TIP4P/Ew model was chosen for water. The simulations were carried out at a broad range of temperatures from 360 K to 100 K, with a step of 5 K. At each temperature, after the 5000 steps energy-minimization procedure, a 10 ns NVT is conducted. After that, a 30 ns NPT simulation was carried out at 1 atm with the proper periodic boundary condition. As shown in Fig. S7, 30 ns is sufficient to equilibrate the system. The temperature and pressure of the system is controlled by the velocity rescaling method and the method by Parrinello and Rahman, respectively. All bonds of water in all the simulations were constrained with the LINCS algorithm to maintain their equilibration length. In all the simulations, the system was propagated using the leap-frog integration algorithm with a time step of 2 fs. The electrostatic interactions were calculated using the Particle Mesh Ewalds (PME) method. A non-bond pair-list cutoff of 1 nm was used and the pair-list was updated every 20 fs. All MD simulations were performed using GROMACS 4.5.1 software packages.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Response to author's changes:

      See public review: The MD data presented appears to be missing description of the methods used.

      If these data support the authors claim that different levels of hydration do not affect the protein structure, careful analysis of the MD simulation data should be presented that show the systems are properly equilibrated under each condition. Additionally, methods are needed to describe the MD parameters and methods used, and for how long the simulations were run.

      We have now added the methods of MD simulation into the revised SI. Please see Reply 5.

      Reviewer #2 (Recommendations For The Authors):

      The authors answered my questions and substantially improved the manuscript.

      We thank the reviewer for the encouraging comments .

    1. Author response:

      'We thank the reviewers for their helpful comments and criticisms of our manuscript and are pleased by the overall positive nature of the comments. For the eLife Version of Record, we plan to carry out the following experiments to address reviewer comments:

      - We will use genetic approaches (e.g., driving p35 in glia to block apoptosis) and molecular markers, such as phospho-Histone H3, to assess whether reduced glial proliferation or increased glial apoptosis contributes to reduced glial cell number.

      - We will assess the ability of glial-specific expression of the Drosophila or Human ifc/DEGS1 transgenes to rescue the ifc lethal phenotype to adulthood.

      - We will replicate key phenotypic findings with additional ifc alleles.

      - We will enhance our characterization of 3xP3 RFP transgenes with respect to glial subtypes both for the insert we used in our study and at least one independent insert.

      - We will edit the text of the manuscript to clarify additional points raised by the reviewers.

      Once we complete the above approaches, we will modify our manuscript accordingly and submit a full response to the reviews to eLife along with the revised manuscript,'

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to consider the effects of phonotactics on the effectiveness of memory reactivation during sleep. They have created artificial words that are either typical or atypical and showed that reactivation improves memory for the latter but not the former.

      Comment 1:

      Strengths:

      This is an interesting design and a creative way of manipulating memory strength and typicality. In addition, the spectral analysis on both the wakefulness data and the sleep data is well done. The article is clearly written and provides a relevant and comprehensive of the literature and of how the results contribute to it.

      We thank the reviewer for his/her positive evaluation of our manuscript. 

      Comment 2:

      Weaknesses:

      (1) Unlike most research involving artificial language or language in general, the task engaged in this manuscript did not require (or test) learning of meaning or translation. Instead, the artificial words were arbitrarily categorised and memory was tested for that categorisation. This somewhat limits the interpretation of the results as they pertain to language science, and qualifies comparisons with other language-related sleep studies that the manuscript builds on.

      We thank the reviewer for this comment. We agree that we did not test for meaning or translation but used a categorization task in which we trained subjects to discriminate artificial words according to their reward associations (rewarded vs. non-rewarded). Previous language studies (Batterink et al., 2014; Batterink and Paller, 2017; Reber, 1967) used artificial words to investigate implicit learning of hidden grammar rules. Here, the language researchers studied generalization of the previously learned grammar knowledge by testing subject’s ability to categorize correctly a novel set of artificial words into rule-congruent versus rule-incongruent words. These differences to our study design might limit the comparability between the results of previous language studies of artificial grammar learning and our findings. We discussed now this aspect as a limitation of our novel paradigm. 

      We added the following sentences to the discussion on p.14, ll. 481-488:

      Based on our paradigm, we investigated categorization learning of artificial words according to their reward associations (rewarded vs. unrewarded) and did not studied aspects of generalization learning of artificial grammar rules (Batterink et al., 2014; Batterink and Paller, 2017; Reber, 1967). This difference might limit the comparability between these previous language-related studies and our findings. However, the usage of artificial words with distinct phonotactical properties provided a successful way to manipulate learning difficulty and to investigate word properties on TMR, whereas our reward categorization learning paradigm had the advantage to increase the relevance of the word learnings due to incentives.    

      Comment 3:

      (2) The details of the behavioural task are hard to understand as described in the manuscript. Specifically, I wasn't able to understand when words were to be responded to with the left or right button. What were the instructions? Were half of the words randomly paired with left and half with right and then half of each rewarded and half unrewarded? Or was the task to know if a word was rewarded or not and right/left responses reflected the participants' guesses as to the reward (yes/no)? Please explain this fully in the methods, but also briefly in the caption to Figure 1 (e.g., panel C) and in the Results section.

      We thank the reviewer for this comment and added additional sentences into the document to provide additional explanations. We instructed the participants to respond to each word by left- and right-hand button presses, whereas one button means the word is rewarded and the other button means the word is unrewarded. The assignment of left- and right-hand button presses to their meanings (rewarded versus unrewarded) differed across subjects. In the beginning, they had to guess. Then over trial repetitions with feedback at the end of each trial, they learned to respond correctly according to the rewarded/unrewarded associations of the words.        

      We added the following sentences to the results section on p.5, ll. 161-168: 

      As a two alternative forced-choice task, we assigned left- and right-hand button presses to the rewarded and the unrewarded word category, counterbalanced across subjects. We instructed the participants to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points). In the beginning, they had to guess. By three presentations of each word in randomized order and by feedback at the end of each trial, they learned to respond correctly according to the rewarded/unrewarded associations of the words (Fig. 1c). 

      We added the following sentences to the caption of Figure 1 on p.6, ll. 188-194:

      As a two alternative forced-choice task, responses of left- and right-hand button presses were assigned to the rewarded and the unrewarded word category, respectively. The participants were instructed to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points). d) Feedback matrix with the four answer types (hits: rewarded and correct; CR, correct rejections: unrewarded and correct; misses: rewarded and incorrect; FA, false alarms: unrewarded and incorrect) regarding to response and reward assignment of the word.

      We added the following sentences to the methods on p.19, ll. 687-692:  

      As a two alternative forced-choice task, we assigned left- and right-hand button presses to the rewarded and the unrewarded word category, counterbalanced across subjects. We instructed the participants to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points).

      Comment 4:  

      (3) Relatedly, it is unclear how reward or lack thereof would translate cleanly into a categorisation of hits/misses/correct rejections/false alarms, as explained in the text and shown in Figure 1D. If the item was of the non-rewarded class and the participant got it correct, they avoided loss. Why would that be considered a correct rejection, as the text suggests? It is no less of a hit than the rewarded-correct, it's just the trial was set up in a way that limits gains. This seems to mix together signal detection nomenclature (in which reward is uniform and there are two options, one of which is correct and one isn't) and loss-aversion types of studies (in which reward is different for two types of stimuli, but for each type you can have H/M/CR/FA separably). Again, it might all stem from me not understanding the task, but at the very least this required extended explanations. Once the authors address this, they should also update Fig 1D. This complexity makes the results relatively hard to interpret and the merit of the manuscript hard to access. Unless there are strong hypotheses about reward's impact on memory (which, as far as I can see, are not at the core of the paper), there should be no difference in the manner in which the currently labelled "hits" and "CR" are deemed - both are correct memories. Treating them differently may have implications on the d', which is the main memory measure in the paper, and possibly on measures of decision bias that are used as well.

      We thank the reviewer for this comment giving us the opportunity to clarify. As explained in the previous comment, for our two alternative forced-choice task, we instructed the participants to press one button when they were thinking the presented word is rewarded and the other button, when they were thinking the word is unrewarded. Based on this instruction, we applied the signal detection theory (SDT), because the subjects had the task to detect when reward was present or to reject when reward was absent. Therefore, we considered correct responses of words of the rewarded category as hits and words of the unrewarded category as correct rejections (see Table below). However, the reviewer is correct because in addition to false alarms, we punished here the incorrect responses by subtraction of money points to control for alternative task strategies of the participants instead of reward association learning of words. We agree that further explanation/argumentation to introduce our nomenclature is necessary.  

      Author response table 1.

      We adjusted the results section on p.5, ll. 169-177:

      To obtain a measurement of discrimination memory with respect to the potential influence of the response bias, we applied the signal detection theory (Green and Swets, 1966). Because, we instructed the participants to respond to each word by left- or right-hand button presses and that one button means reward is present whereas the other button means reward is absent, we considered correct responses of words of the rewarded category as hits and words of the unrewarded category as correct rejections. Accordingly, we assigned the responses with regard to the reward associations of the words to the following four response types: hits (rewarded, correct); correct rejections (unrewarded, correct); misses (rewarded, incorrect); and false alarms (unrewarded, incorrect). Dependent on responses, subjects received money points (Fig. 1d). 

      Comment 5:

      (4) The study starts off with a sample size of N=39 but excludes 17 participants for some crucial analyses. This is a high number, and it's not entirely clear from the text whether exclusion criteria were pre-registered or decided upon before looking at the data. Having said that, some criteria seem very reasonable (e.g., excluding participants who were not fully exposed to words during sleep). It would still be helpful to see that the trend remains when including all participants who had sufficient exposure during sleep. Also, please carefully mention for each analysis what the N was.

      Our study was not pre-registered. Including all the subjects independent of low prememory performance, but with respect to a decent number of reactivations (> 160 reactivations, every word at least 2 times), resulted in a new dataset with 15 and 13 participants of the high- and low-PP cueing condition, respectively. Here, statistical analyses revealed no significant overnight change anymore in memory performance in the high-PP cueing condition (Δ memory (d'): t(14) = 1.67, p = 0.12), whereas the increase of the bias in decision making towards risk avoidance still remained significant (Δ bias (c-criterion): t(14) = 3.36, p = 0.005).

      We modified and added the following sentences to the discussion on p.13, ll. 456-458:

      Our study has limitations due to a small sample size and between-subject comparisons. The criteria of data analyses were not pre-registered and the p-values of our behavior analyses were not corrected for multiple comparisons.

      Comment 6:             

      (5) Relatedly, the final N is low for a between-subjects study (N=11 per group). This is adequately mentioned as a limitation, but since it does qualify the results, it seemed important to mention it in the public review.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. Accordingly, we now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.        

      We added the following sentences to the discussion about the limitations on p.14, ll. 465-488: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 7:

      (6) The linguistic statistics used for establishing the artificial words are all based on American English, and are therefore in misalignment with the spoken language of the participants (which was German). The authors should address this limitation and discuss possible differences between the languages. Also, if the authors checked whether participants were fluent in English they should report these results and possibly consider them in their analyses. In all fairness, the behavioural effects presented in Figure 2A are convincing, providing a valuable manipulation test.

      We thank the reviewer pointing to the misalignment between the German-speaking participants and the used artificial words based on American English. Further, we did not assessed the English language capability of the participants to control it as a potential confounder, whereas comparative control analyses revealed no significant differences between the both cueing groups in pre-sleep memory performance (see Table S1). 

      We now discussed these comments as limitations on p.14, ll. 473-481: 

      Further, we used artificial words based on American English in combination with German speaking participants, whereas language differences of pronunciation and phoneme structures might affect word perception and memory processing (Bohn and Best, 2012). On the other hand, both languages are considered to have the same language family (Eberhard et al., 2019) and the phonological distance between English and German is quite short compared for example to Korean (Luef and Resnik, 2023). Thus, major common phonological characteristics across both languages are still preserved. In addition, our behavior analyses revealed robust word discrimination learning and distinct memory performance according to different levels of phonotactic probabilities providing evidence of successful experimental manipulation. 

      Comment 8:

      (7) With regard to the higher probability of nested spindles for the high- vs low-PP cueing conditions, the authors should try and explore whether what the results show is a general increase for spindles altogether (as has been reported in the past to be correlated with TMR benefit and sleep more generally) or a specific increase in nested spindles (with no significant change in the absolute numbers of post-cue spindles). In both cases, the results would be interesting, but differentiating the two is necessary in order to make the claim that nesting is what increased rather than spindle density altogether, regardless of the SW phase.

      We conducted additional analyses based on detected sleep spindles to provide additional data according to this question. 

      We added the following section to the supplementary data on pp. 31-32, ll. 1007-1045:  

      After conducting a sleep spindle detection (frequency range of 12-16Hz, see methods for details), we compared the sleep spindle density between the TMR conditions of high- and lowPP showing no significant difference (see Fig. S8a and Table S9). Next, we subdivided the detected sleep spindles into coupled and uncoupled sleep spindles with the previously detected slow waves (SW; analyses of Fig. 4). Sleep spindles were defined as coupled when their amplitude peak occurred during the SW up-state phase (0.3 to 0.8s time-locked to the SW troughs). A two-way mixed design ANOVA on the amplitude size of the sleep spindles with the cueing group as a between-subject factor (high-PP-cued vs. low-PP-cued) and SW-coupling as a within-subject factor (coupled vs. uncoupled) showed a significant interaction effect (cueing group × SW-coupling: F(1,20) = 4.51, p = 0.046, η2 = 0.18), a significant main effect of SW-coupling (F(1,20) = 85.02, p < 0.001, η2 = 0.81), and a trend of significance of the main effect of the cueing group (F(1,20) = 3.54, p = 0.08). Post-hoc unpaired t-tests revealed a significant higher amplitude size of the coupled sleep spindles of the cueing group of high- compared to low-PP (t(20) = 2.13, p = 0.046, Cohen’s d = 0.91; Fig. S8b) and no significant group difference of the uncoupled sleep spindles (t(20) = 1.62, p = 0.12). An additional comparison of the amount of coupled sleep spindles between the cueing groups revealed no significant difference (see Table S9). 

      Here, we found that detected sleep spindles coupled to the SW up-state phase occurred with higher amplitude after TMR presentations of the high-PP words in comparison to the low-PP words, whereas the sleep spindle density and the amount of sleep spindles coupled to the SW up-state phase did not differed between the cueing conditions.     

      We added the following sentences to the methods on pp. 22-23, ll. 822-839:  

      Sleep spindle analyses 

      We detected fast sleep spindles by band-pass filtering (12-16Hz) the signal of the Pz electrode during the auditory cueing trials in the time windows of -2 to 8s according to stimulus onsets. The amplitude threshold was calculated individually for each subject as 1.25 standard deviations (SDs) from the mean. The beginning and end times of the sleep spindles were then defined as the points at which the amplitude fell below 0.75 SDs before and after the detected sleep spindle. Only sleep spindles with a duration of 0.5-3 s were included in subsequent analyses. 

      To compare the sleep spindle densities between the different cueing conditions of high- and low-PP, we computed the grand average sleep spindle density distribution in number per trial with a bin size of 0.5s from -0.5 to 6s time-locked to stimulus onset in each condition (see Fig. S8a and Table S9).     

      Based on the detected slow waves and sleep spindles, we defined coupling events when the positive amplitude peak of a detected sleep spindle was occurring during the slow wave upstate phase in a time window of 0.3 to 0.8s according to the trough of a slow wave. 

      We computed the averaged amplitude size of each detected sleep spindle by calculating the mean of the absolute amplitude values of all negative and positive peaks within a detected sleep spindle (see Fig. S8b).

      We added the following sentences to the results on p.10, ll. 338-343:  

      By conducting an additional analyses based on detection of fast sleep spindles (12-16Hz; see methods), we confirmed that fast sleep spindles during the SW up-states (from 0.3 to 0.8s after the SW trough) occurred with significantly higher amplitude after the cueing presentation of high- compared to low-PP words, whereas parameters of sleep spindle density and the amount sleep spindles coupled to the SW up-state did not differed between the cueing conditions (see Fig. S8 and Table S9).       

      Reviewer #2 (Public Review):

      Summary:

      The work by Klaassen & Rasch investigates the influence of word learning difficulty on sleepassociated consolidation and reactivation. They elicited reactivation during sleep by applying targeted memory reactivation (TMR) and manipulated word learning difficulty by creating words more similar (easy) or more dissimilar (difficult) to our language. In one group of participants, they applied TMR of easy words and in another group of participants, they applied TMR of difficult words (between-subjects design). They showed that TMR leads to higher memory benefits in the easy compared to the difficult word group. On a neural level, they showed an increase in spindle power (in the up-state of an evoked response) when easy words were presented during sleep.

      Comment 9:

      Strengths:

      The authors investigate a research question relevant to the field, that is, which experiences are actually consolidated during sleep. To address this question, they developed an innovative task and manipulated difficulty in an elegant way.

      Overall, the paper is clearly structured, and results and methods are described in an understandable way. The analysis approach is solid.

      We thank the reviewer for his/her positive evaluation of our manuscript.

      Weaknesses:

      Comment 10:

      (1) Sample size

      For a between-subjects design, the sample size is too small (N = 22). The main finding (also found in the title "Difficulty in artificial word learning impacts targeted memory reactivation") is based on an independent samples t-test with 11 participants/group.

      The authors explicitly mention the small sample size and the between-subjects design as a limitation in their discussion. Nevertheless, making meaningful inferences based on studies with such a small sample size is difficult, if not impossible.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. Accordingly, we now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.        

      We added the following sentences to the discussion about the limitations on p.14, ll. 465-473: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table

      S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 11:

      (2) Choice of task

      though the task itself is innovative, there would have been tasks better suited to address the research question. The main disadvantage the task and the operationalisation of memory performance (d') have is that single-trial performance cannot be calculated. Consequently, choosing individual items for TMR is not possible.

      Additionally, TMR of low vs. high difficulty is conducted between subjects (and independently of pre-sleep memory performance) which is a consequence of the task design.

      The motivation for why this task has been used is missing in the paper.

      We used a reward task combined with TMR because previous studies revealed beneficial effects of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021). In addition, we wanted to increase the motivation of the participants, as they could receive additional monetary compensation according to their learning and memory task performances. Furthermore, we designed the task, with the overall possibility to translate this task to operant conditioning in rats (see research proposal: https://data.snf.ch/grants/grant/168602). However, the task turned out to be too difficult to translate to rats, whereas we developed a different learning paradigm for the animal study (Klaassen et al., 2021) of this cross-species research project.       

      We added the following sentence to the introduction on p.4, ll. 134-137:

      To consider the beneficial effect of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021), we trained healthy young participants to categorize these words into rewarded and unrewarded words to gain and to avoid losses of money points.  

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors investigated the effects of targeted memory reactivation (TMR) during sleep on memory retention for artificial words with varying levels of phonotactical similarity to real words. The authors report that the high phonotactic probability (PP) words showed a more pronounced EEG alpha decrease during encoding and were more easily learned than the low PP words. Following TMR during sleep, participants who had been cued with the high PP TMR, remembered those words better than 0, whilst no such difference was found in the other conditions. Accordingly, the authors report higher EEG spindle band power during slow-wave up-states for the high PP as compared to low PP TMR trials. Overall, the authors conclude that artificial words that are easier to learn, benefit more from TMR than those which are difficult to learn.

      Comment 12 & 13:

      Strengths:

      (1) The authors have carefully designed the artificial stimuli to investigate the effectiveness of TMR on words that are easy to learn and difficult to learn due to their levels of similarity with prior wordsound knowledge. Their approach of varying the level of phonotactic probability enables them to have better control over phonotactical familiarity than in a natural language and are thus able to disentangle which properties of word learning contribute to TMR success.

      (2) The use of EEG during wakeful encoding and sleep TMR sheds new light on the neural correlates of high PP vs. low PP both during wakeful encoding and cue-induced retrieval during sleep.

      We thank the reviewer for his/her positive evaluation of our manuscript.

      Weaknesses:

      Comment 14:

      (1) The present analyses are based on a small sample and comparisons between participants. Considering that the TMR benefits are based on changes in memory categorization between participants, it could be argued that the individuals in the high PP group were more susceptible to TMR than those in the low PP group for reasons other than the phonotactic probabilities of the stimuli (e.g., these individuals might be more attentive to sounds in the environment during sleep). While the authors acknowledge the small sample size and between-subjects comparison as a limitation, a discussion of an alternative interpretation of the data is missing.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. We thank the reviewer for this helpful comment and now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.

      We added the following sentences to the discussion on p.14, ll. 465-473: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 15:

      (2) While the one-tailed comparison between the high PP condition and 0 is significant, the ANOVA comparing the four conditions (between subjects: cued/non-cued, within-subjects: high/low PP) does not show a significant effect. With a non-significant interaction, I would consider it statistically inappropriate to conduct post-hoc tests comparing the conditions against each other. Furthermore, it is unclear whether the p-values reported for the t-tests have been corrected for multiple comparisons. Thus, these findings should be interpreted with caution.

      We thank the reviewer for this comment giving us the opportunity to correct our analyses and clarify with additional description. Indeed, we investigated at first overnight changes in behavior performance within the four conditions, conducting t-tests against 0 of Δ-values of d' and c-criterion. Whereas for all our statistical analyses the p-value was set at p < 0.05 for two-tailed testing, we did not corrected the p-value of our behavior analyses for multiple comparisons. To investigate subsequently differences between conditions, we conducted additional ANOVAs. We agree with the reviewer that without significant of results of the ANOVA, post-hoc analyses should not be conducted. Taken in account as well the recommendation of reviewer 1, we included now only post-hoc pairwise comparisons when the interaction effect of the ANOVA revealed at least a trend of significance (p < 0.1). 

      We removed the following post-hoc analyses from the results section on p.9, ll. 291-295: 

      Additional post-hoc pairwise comparisons revealed a significant difference between the highPP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.28, p = 0.22; lowPP cued t(20) = 1.57, p = 0.13).  

      Further, we mentioned the lack of correction for multiple comparisons as a limitation of our results in the discussion on p.13, ll. 456-458:  

      The criteria of data analyses were not pre-registered and the p-values of our behavior analyses were not corrected for multiple comparisons.

      We added the following sentences to the methods p.23, ll. 842-849:

      To analyze overnight changes of sleep behavioral data within TMR conditions, we conducted at first dependent sample t-tests against 0 of Δ-values (post-sleep test minus pre-sleep test) of d' and c-criterion (see Fig. 3). Two-way mixed design ANOVAs were computed to compare Δvalues between TMR conditions. After confirming at least a trend of significance (p < 0.1) for the interaction effect, we conducted post-hoc pairwise comparisons by independent and dependent sample t-tests. For all behavior statistical analyses, the p-value was set at p < 0.05 for two-tailed testing. A p-value < 0.1 and > 0.05 was reported as a trend of significance.

      Comment 16:

      (3) With the assumption that the artificial words in the study have different levels of phonotactic similarity to prior word-sound knowledge, it was surprising to find that the phonotactic probabilities were calculated based on an American English lexicon whilst the participants were German speakers. While it may be the case that the between-language lexicons overlap, it would be reassuring to see some evidence of this, as the level of phonotactic probability is a key manipulation in the study.

      We thank the reviewer pointing to the misalignment between the German-speaking participants and the used artificial words based on American English. In line with this recommendation, we added a more outlined argumentation to the manuscript about the assumption of our study that major common phonetic characteristics across both languages are still preserved.       

      We now discussed these aspects on p.14, ll. 473-481:

      Further, we used artificial words based on American English in combination with German speaking participants, whereas language differences of pronunciation and phoneme structures might affect word perception and memory processing (Bohn and Best, 2012). On the other hand, both languages are considered to have the same language family (Eberhard et al., 2019) and the phonological distance between English and German is quite short compared for example to Korean (Luef and Resnik, 2023). Thus, major common phonological characteristics across both languages are still preserved. In addition, our behavior analyses revealed robust word discrimination learning and distinct memory performance according to different levels of phonotactic probabilities providing evidence of successful experimental manipulation. 

      Comment 17:

      (4) Another manipulation in the study is that participants learn whether the words are linked to a monetary reward or not, however, the rationale for this manipulation is unclear. For instance, it is unclear whether the authors expect the reward to interact with the TMR effects.

      We used a reward task combined with TMR because previous studies revealed beneficial effects of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021). In addition, we wanted to increase the motivation of the participants, as they could receive additional monetary compensation according to their learning and memory task performances. Furthermore, we designed the task, with the overall possibility to translate this task to operant conditioning in rats (see research proposal: https://data.snf.ch/grants/grant/168602). However, the task turned out to be too difficult to translate to rats, whereas we developed a different learning paradigm for the animal study (Klaassen et al., 2021) of this cross-species research project.       

      We added the following sentence to the introduction on p.4, ll. 134-137:

      To consider the beneficial effect of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021), we trained healthy young participants to categorize these words into rewarded and unrewarded words to gain and to avoid losses of money points.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comment 18:

      (1) Please clearly define all linguistics terms - and most importantly the term "phonotactics" - at first use.

      We thank the reviewer for this recommendation and we added the definition of phonotactics and further reduced the diversity of linguistic terms to improve readability. 

      We added the following sentences to the beginning of the introduction on p.3, ll. 72-76:

      One critical characteristic of similarity to pre-existing knowledge in auditory word processing is its speech sound (phoneme) pattern. In phonology as the field of language specific phoneme structures, phonotactics determines the constraints of word phoneme composition of a specific language.

      Comment 19:

      (2) Some critical details about the methods should be included in the Results section to make it comprehensible. For example, the way the crucial differences between G1-4 words should be addressed in the Results, not only in Figure 1.

      According to the recommendation, we added this information to the results section.  We added the following sentences to the results section on p.4, ll. 145-154:

      To study the impact of difficulty in word learning on TMR, we developed a novel learning paradigm. We formed four sets of artificial words (40 words per set; see Table S3 and S4) consisting of different sequences of two vowels and two consonants. Here, we subdivided the alphabet into two groups of consonants (C1: b, c, d, f, g, h, j, k, l, m; C2: n, p, q, r, s, t, v, w, x, z) and vowels (V1: a, e, I; V2: o, u, y). Four-letter-words were created by selecting letters from the vowel and consonant groups according to four different sequences (G1:C1, V1, V2, C2; G2: C1, V1, C2, V2; G3: V1, C1, C2, V2; G4: V1, C1, V2, C2; Fig. 1a; see methods for further details). Comparison analyses between the sets revealed significant differences in phonotactic probability (PP; Fig. 1b; unpaired t-tests: G1 / G2 > G3 / G4, p < 0.005, values of Cohen’s d > 0.71).

      Comment 20

      (3) Was scoring done both online and then verified offline? If so, please note that.

      We included now this information.  

      We adjusted the method section on p.21, ll. 765-769:   

      The sleep stages of NREM 1 to 3 (N1 to N3), wake, and REM sleep were scored offline and manually according to the criteria of the American Academy of Sleep Medicine (AASM) by visual inspection of the signals of the frontal, central, and occipital electrodes over 30s epochs (Iber et al., 2007). Based on offline scoring, we confirmed TMR exposure during N2 and N3 and no significant differences (p-values > 0.05) of sleep parameters between the cueing groups (see Table S2).  

      Comment 21:

      (4) In Figure 2, please arrange the panel letters in an easier-to-read way (e.g., label upper right panel b with a different letter).

      Now we rearranged the panel letters according to the recommendation.

      We adjusted Figure 2 on p.8, ll. 242-258:     

      Comment 22

      (5) In the first paragraph on TMR effects, please note which memory measure you are comparing (i.e., d').

      We added this information according to the recommendation.  

      We adjusted the sentence of the results on p.8, ll. 260-263:

      To examine whether TMR during sleep impacts memory consolidation of discrimination learning with respect to learning difficulty, we calculated the overnight changes by subtracting the pre- from the post-sleep memory performance based on d'-values of the reactivated sequences (cued) and non-reactivated sequences (uncued).

      Comment 23:

      (6) Please show the pre-sleep and post-sleep test scores for both word categories (not only the delta). It may be best to show this as another data point in Fig 2a, but it may be helpful to also see this split between cued and uncued.

      We added the pre-sleep and post-sleep test scores with the individual data points as an additional figure. 

      We added the following figure to the supplementary data on p.28, ll. 936-940:  

      Comment 24:

      (7) In the sentence "An additional two-way mixed design ANOVA on the same values with cueing as a between-subject factor (cued vs. uncued) ...", a more exact phrasing for the last parentheses would probably be "(high-PP-Cued vs Low-PP-Cued)". Both groups were cued.

      We thank the reviewer pointing this out. According to the recommendation, we corrected the descriptions of the two-way mixed design ANOVAs. In addition, we detected a mistake of wrong assignments of the conditions to ANOVAs and corrected the reported values.   

      We adjusted the sentences and corrected the values on p.9, ll. 271-275 and ll. 289-291: 

      An additional two-way mixed design ANOVA on the same values with the factor cueing (cued vs. uncued) as a within-subject factor and group as a between-subject factor revealed trends of significance (p < 0.1) for the interaction (cueing × group: F(1,20) = 3.47, p = 0.08) and the main effect of group (F(1,20) = 3.28, p = 0.09). The main effect of cueing was not significant (F(1,20) = 0.58, p = 0.46).

      An ANOVA on c-criterion changes showed no significant effects (interaction cueing × group: F(1,20) = 2.66, p = 0.12; main effect cueing  F(1,20) = 2.08, p = 0.17; main effect group F(1,20) = 0.38, p = 0.55).

      Comment 25:

      (8) In the same ANOVA, please mention that there is a trend toward an interaction effect. If there wasn't one, the post-hoc comparison would be unwarranted. Please consider noting other p<0.1 pvalues as a trend as well, for consistency.

      Regarding this recommendation, we included now only post-hoc pairwise comparisons after confirming at least a trend toward an interaction effect of these ANOVAs and reported consistently a p-value < 0.1 and > 0.05 as a trend of significance.

      We added the following sentences to the methods p.23, ll. 844-849:

      Two-way mixed design ANOVAs were computed to compare Δ-values between TMR conditions. After confirming at least a trend of significance (p < 0.1) for the interaction effect, we conducted post-hoc pairwise comparisons by independent and dependent sample t-tests. For all behavior statistical analyses, the p-value was set at p < 0.05 for two-tailed testing. A p-value < 0.1 and > 0.05 was reported as a trend of significance.

      We removed the following post-hoc analyses from the results section on p.9, ll. 291-295: 

      Additional post-hoc pairwise comparisons revealed a significant difference between the highPP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.28, p = 0.22; lowPP cued t(20) = 1.57, p = 0.13).          

      Comment 26:      

      (9) Please consider adding an analysis correlating spindle power with memory benefit across participants. Even if it is non-significant, it is important to report given that some studies have found such a relationship.

      According to this recommendation, we conducted an additional correlation analyses.

      We added the following sentences to the manuscript into the results (pp. 10-11, ll. 346-349), the discussion (p.12, ll. 413-417), and the methods (p.23, ll. 864-867):   

      Whereas we found a significant group difference in spindle power nested during SW up-states,   conducting further whole sample (n = 22) correlation analyses between the individual spindle power values of the significant cluster and the overnight changes of behavior measurements revealed no significant correlations (Δ d': r = 0.16, p = 0.48; Δ c-criterion: r = 0.19, p = 0.40).

      In addition to our result of the significant group difference, we failed to find significant correlations between SW nested spindle power values and overnight changes in behavior measurements, whereas previous studies reported associations of SW and spindle activities during sleep with the integration of new memories in pre-existing knowledge networks (Tamminen et al., 2013, 2010).

      By using the same extracted power values (0.3 to 0.8s; 11-14Hz; Pz, P3, P4, O2, P7) per subject, we performed whole sample (n = 22) Pearson correlation analyses between these power values and the overnight changes of behavior measurements of the cued condition (Δ d' and Δ ccriterion).

      Reviewer #2 (Recommendations For The Authors):

      (1) Choice of task

      Comment 27:      

      In general, I find your task well-designed and novel. In light of your research question, however, I wonder why you chose this task. When you outlined the research question in the introduction, I expected a task similar to Schreiner et al. (2015). For example, participants have to associate high PP words with each other and low PP words. The advantage here would be that you could test the benefits of TMR in a within-subjects design (for example, cueing half of the remembered high and half of the remembered low PP words).

      Please see our previous response at comment 14.    

      Comment 28:

      Why did you decide to introduce a reward manipulation?

      Please see our previous response at comment 11.    

      Comment 29:

      Why did you do the cueing on a category level (cueing all high PP or all low PP words instead of single word cueing or instead of cueing 20 reward high-PP, 20 unrewarded high-PP plus 20 reward low-PP and 20 unrewarded low-PP)? Both alternatives would have provided you the option to run your statistics within participants.

      Please see our previous response at comment 14.    

      Comment 30:

      (2) Between-subjects design and small sample size.

      Why did you decide on a between-subjects design that severely reduces your power?

      Why did you just collect 22 participants with such a design? Were there any reasons for this small sample size? Honestly, I think publishing a TMR study with healthy participants and such a small sample size (11 participants for some comparisons) is not advisable.

      Please see our previous response at comment 14.

      Comment 31:

      (3) Encoding performance.

      Is d' significantly above 0 in the first repetition round? I would assume that the distinction between rewarded and non-rewarded words is just possible after the first round of feedback.

      Indeed, conducting t-tests against 0 revealed significantly increased d'-values in the first repetition round (2nd presentation) in both PP conditions (high-PP: 0.85 ± 0.09, t(32) = 9.17, p < 0.001; low-PP: 0.62 ± 0.09, t(32) = 6.83, p < 0.001).  

      Comment 32:

      (4) Encoding response options

      If you want to you could make it more explicit what exactly the response options are. I assume that one button means a word has a high reward and the other button means a word has a low reward. Making it explicit increases the understanding of the results section.

      Please see our previous response at comment 3.

      Comment 33:           

      (5) Alpha desynchronisation.

      Relative change

      Why did you subtract alpha power during the 1st presentation from alpha power during 2nd and 3rd presentation? You baseline-corrected already and individually included the 1st, 2nd, and 3rd repetition in your behavioural analysis.

      Based on this analysis, we aimed to examine the relative change in alpha power between PP-conditions of memory-relevant word repetitions. Therefore, to extract memory relevant changes of EEG activities, the first word presentation of naive stimulus processing could serve as a more representative baseline condition covering the time-window of interest of 0.7 to 1.9 s after the stimulus onset compared to a baseline condition before stimulus onset (-1 to -0.1s). 

      To explain the rational of the analyses with the baseline condition more clearly, we added this information to the results section on p.7, ll. 222-226: 

      We obtained the changes in power values by subtracting the first from the second and third presentation for the high- and low-PP condition, respectively. Here, the first word presentation of naive stimulus processing served us with a more representative baseline condition covering the time-window of interest of 0.7 to 1.9 s after the stimulus onset to examine relevant changes of encoding.  

      Comment 34:

      (6) Alpha desynchronisation as a neural correlate of encoding depth & difficulty?

      "In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth. In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth."

      Given that the low-PP words are more difficult to learn, I was expecting to see higher alpha desynchronisation in the low-PP relative to the high-PP words. Could you outline in a bit more detail how your findings fit into the literature (e.g., Simon Hanslmayr did a lot of work on this)?

      I would also advise you to add citations e.g., after your sentence in the quote above ("as an assumed neural correlate of encoding depth").

      We thank the reviewer for the recommendation giving us the opportunity to discuss in more detail how our results relate to previous findings. 

      We added additional sentences to the discussion on p.13, ll. 441-455:    

      Additional studies linked alpha desynchronization to cognitive effort and cognitive load (Proskovec et al., 2019; Zhu et al., 2021). So, one could assume to observe higher alpha desynchronization in the more difficult to learn condition of low-PP compared to high-PP. On the other hand numerous studies investigating oscillatory correlates of learning and memory showed that alpha desynchronization is associated with memory across different tasks, modalities and experimental phases of encoding and retrieval (Griffiths et al., 2016, 2021, 2019a, 2019b; Hanslmayr et al., 2009; Michelmann et al., 2016). Strikingly, Griffith and colleagues (Griffiths et al., 2019a) revealed by simultaneous EEG-fMRI recordings a negative correlation between the occurrence of patterns of stimulus-specific information detected by fMRI and cortical alpha/beta suppression. Here, the authors suggested that a decrease of alpha/beta oscillations might represent the neuronal mechanism of unmasking the task-critical signal by simultaneous suppression of task-irrelevant neuronal activities to promote information processing. Following this interpretation, we assume that over the course of learning elevated memory processing of the easier to learn stimuli is associated with enhanced information processing and thus accompanied by higher cortical alpha desynchronization in comparison of the more difficult to learn stimuli.

      In addition, we added the mentioned quote on p.7, ll. 239-240:

      In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth (Griffiths et al., 2021; Hanslmayr et al., 2009).

      Comment 35:

      (7) Exclusion criterion.

      Why did you use a d' > 0.9 as a criterion for data inclusion?

      This criterion ensured that each included subject had at least in one PP-condition a d' > 1.05 of pre-sleep memory performance, which corresponds to a general accuracy rate of 70%. 

      Accordingly, we adjusted these sentences of the method section on p.19, ll. 677-680: 

      Data were excluded from subjects who did not reach the minimal learning performance of d' > 1.05 during the pre-sleep memory test in at least one of the two PP conditions, whereas this threshold value corresponds to accuracy rates of 70% (n = 5). In addition, we excluded one subject who showed a negative d' in one PP condition of the pre-sleep memory test (n = 1). 

      Comment 36:

      (8) Coherence of wording.

      When you talk about your dependent variable (d') you sometimes use sensitivity. I would stick to one term.

      We replaced the word sensitivity with d'.    

      (9) Criterion

      Comment 37:

      Why do you refer to a change in criterion (Figure 3b, axis labels) as a change in memory? Do you think the criterion says something about memory?

      We corrected the axis label of Figure 3b and deleted here the word memory.

      Comment 38:

      Additionally, why did you analyse the effect of TMR on the criterion? Do you expect the criterion to change due to sleep-dependent memory consolidation? This section would benefit from more explanation. Personally, I am very interested in your thoughts and your hypothesis (if you had one, if not that is also fine but then, make it explicit that it was an exploratory analysis).

      By conducting exploratory analyses of overnight changes of the c-criterion measurements, we aimed to examine the bias of decision-making to provide comprehensive data according to the framework of the signal detection theory. Regarding the previous literature showing mainly beneficial effects of sleep on learning and memory, we focused with our hypothesis on d' and explored additionally the c-criterion.

      Despite our task design with gains/hits of +10 money points and losses/FAs of -8 (instead of -10), the subjects showed already during the pre-sleep memory task significant biases towards loss avoidance in both PP conditions (t-tests against 0: high-PP: 0.44 ± 0.07, t(21) = 5.63, p < 0.001; low-PP: 0.47 ± 0.09, t(21) = 5.51, p < 0.001). As already reported in the preprint, we found an additional significant increase of c-criterion by TMR solely for the high-PP words (see Fig. 3b). Even by integrating subjects with poor pre-sleep memory performance (high-PP-cueing group: n = 15; low-PP-cueing group: n = 13), t-tests against 0 revealed a significant increase of the high-PP cueing condition (t(14) = 3.36, p = 0.005) and no significant overnight changes in the other conditions (high-PP uncued: t(12) = 1.39, p = 0.19; low-PP cued: t(12) = 1.47, p = 0.17; low-PP uncued: t(14) = -0.20, p = 0.84). These exploratory findings on c-criterion suggest potential applications of TMR to affect decision-making biases in combination with reward learning.      

      We revised the manuscript mentioning the exploratory character of the c-criterion analyses of the results on p.9, ll. 282-283 and of the discussion on p.12, ll. 400-402:  

      We examined next as an exploratory analysis whether TMR conditions influence biases in decision-making.

      By conducting an additional exploratory analysis, we observed a significant change of the decision bias in the cueing condition of the easy to learn words and no overnight changes in the other conditions.

      Comment 39:

      (10) You detected SWs in the time range of 0-6 sec post sound stimulation. How was the distribution of all detected SW down-states in this time range? (You could plot a histogram for this.)

      We illustrated now the detected SWs in the time range of 0 to 6 s after stimulus onset. 

      We added a histogram to the supplementary section on p.30, ll. 982-986:  

      Reviewer #3 (Recommendations For The Authors):

      Comment 40:

      (1) In line with the weakness outlined above, I would recommend including a discussion of how the between-subject comparison and small sample size could affect the results and provide alternative interpretations.

      Please see our previous response at comment 14.

      Comment 41:

      (2) Regarding my point about statistical comparisons, I would recommend that the authors follow best practice guidelines for post-hoc tests and multiple comparisons. In Figures 3a and b, I would also recommend removing the stars indicating significance from the post-hoc tests (if this is what they reflect). Perhaps this link will be useful: https://www.statology.org/anova-post-hoc-tests/

      Please see our previous response at comment 15.    

      Comment 42:

      (3) Furthermore, to address any doubts about the possible phonotactic probability differences between languages, I would recommend that the authors show whether the languages overlap, the level of English fluency in the German-speaking participants, and/or another way of reassuring that this is unlikely to have affected the results.

      Please see our previous response at comment 7.    

      Comment 43:

      (4) In the introduction, I would recommend that the authors outline a clear rationale for the reward/no reward manipulation.

      Please see our previous response at comment 11.    

      Comment 44:

      (5) Figure 1c: Please include what response options participants had, e.g., 'rewarded/not rewarded'. This would make the type of categorization clearer to the reader.

      Please see our previous response at comment 3.

      Comment 45:

      (6) It is unclear whether the additional ANOVA conducted on the time and frequency of the identified clusters included all channels or only the channels contributing to the cluster. Consider clarifying this in the relevant methods and results. Furthermore, I would recommend labelling this as a posthoc test as this analysis was guided by an initial peak at the data and the timings, frequencies, and channels of interest were not selected a-priori.

      We thank the reviewer for this recommendation and labelled the additional repeatedmeasure ANOVA as a post-hoc test. Further, we mentioned the used channels (Pz and Cz) for this analyses.

      We adjusted the results section on p.7, ll. 230-233 and the methods section on p.23, ll. 858-860:            

      A post-hoc repeated-measure ANOVA on alpha power changes (merged over Pz and Cz electrodes) with PP (high vs. low) and presentations (2 to 3) as within-subjects factors revealed a main effect of PP (F(1,32) = 5.42, p = 0.03, η2 = 0.15), and a significant interaction (F(1,32)  = 7.38, p = 0.01, η2 = 0.19; Fig. 2e).

      After confirming the existence of a significant cluster, we conducted an additional post-hoc repeated-measure ANOVA with averaged values of the identified time and frequency range of interest and merged over the Pz and Cz electrodes (see Fig. 2e).

      Comment 46:

      (7) Figure 3: To better illustrate within- vs. between-subjects comparisons and promote transparency, please add individual points and lines between the within-subjects conditions.

      According to this recommendation, we changed Figure 3 to add the individual data points by lines.  

      We modified Figure 3 on p.9, ll. 299-303:  

      Comment 47:

      (8) For the SW density time-bin analyses, please include statistics for all comparisons (i.e., through 0 s to 3 s) and say whether these were corrected for multiple comparisons.

      According to this recommendation, we included now statistics for all comparisons. 

      We added table S6 table to the supplementary data on p.29, l.962:     

      Comment 48:

      (9) Consider reporting effect sizes.

      We thank the reviewer for this recommendation and we added now effect sizes of significant results. 

      Comment 49:

      (10) For transparency and replicability, consider including a list of the four stimulus sets including their phoneme and biphone probabilities.

      We included a list of the four stimulus sets with their phoneme and biphone probabilities  

      We added table S3 and table S4 to the supplementary data on pp. 26-27:       

      References

      Asfestani MA, Brechtmann V, Santiago J, Peter A, Born J, Feld GB. 2020. Consolidation of Reward Memory during Sleep Does Not Require Dopaminergic Activation. J Cogn Neurosci 32:1688– 1703. doi:10.1162/JOCN_A_01585

      Batterink LJ, Oudiette D, Reber PJ, Paller KA. 2014. Sleep facilitates learning a new linguistic rule.

      Neuropsychologia 65:169–79. doi:10.1016/j.neuropsychologia.2014.10.024

      Batterink LJ, Paller KA. 2017. Sleep-based memory processing facilitates grammatical generalization: Evidence from targeted memory reactivation. Brain Lang 167:83–93. doi:10.1016/J.BANDL.2015.09.003

      Bohn OS, Best CT. 2012. Native-language phonetic and phonological influences on perception of American English approximants by Danish and German listeners. J Phon 40:109–128. doi:10.1016/J.WOCN.2011.08.002

      Cairney SA, Guttesen A á. V, El Marj N, Staresina BP. 2018. Memory Consolidation Is Linked to Spindle-Mediated Information Processing during Sleep. Curr Biol 28:948-954.e4. doi:10.1016/j.cub.2018.01.087

      Eberhard DM, Simons GF, Fennig CD. 2019. Ethnologue: Languages of the world . SIL International. Online version: http://www.ethnologue.com.

      Fischer S, Born J. 2009. Anticipated reward enhances offline learning during sleep. J Exp Psychol Learn Mem Cogn 35:1586–1593. doi:10.1037/A0017256

      Green DM, Swets JA. 1966. Signal detection theory and psychophysics., Signal detection theory and psychophysics. Oxford,  England: John Wiley.

      Griffiths B, Mazaheri A, Debener S, Hanslmayr S. 2016. Brain oscillations track the formation of episodic memories in the real world. Neuroimage 143:256–266. doi:10.1016/j.neuroimage.2016.09.021

      Griffiths BJ, Martín-Buro MC, Staresina BP, Hanslmayr S, Staudigl T. 2021. Alpha/beta power decreases during episodic memory formation predict the magnitude of alpha/beta power decreases during subsequent retrieval. Neuropsychologia 153. doi:10.1016/j.neuropsychologia.2021.107755

      Griffiths BJ, Mayhew SD, Mullinger KJ, Jorge J, Charest I, Wimber M, Hanslmayr S. 2019a. Alpha/beta power decreases track the fidelity of stimulus specific information. Elife 8. doi:10.7554/eLife.49562

      Griffiths BJ, Parish G, Roux F, Michelmann S, van der Plas M, Kolibius LD, Chelvarajah R, Rollings DT, Sawlani V, Hamer H, Gollwitzer S, Kreiselmeyer G, Staresina B, Wimber M, Hanslmayr S. 2019b. Directional coupling of slow and fast hippocampal gamma with neocortical alpha/beta oscillations in human episodic memory. Proc Natl Acad Sci U S A 116:21834–21842. doi:10.1073/pnas.1914180116

      Hanslmayr S, Spitzer B, Bäuml K-H. 2009. Brain oscillations dissociate between semantic and nonsemantic encoding of episodic memories. Cereb Cortex 19:1631–40. doi:10.1093/cercor/bhn197

      Iber C, Ancoli‐Israel S, Chesson AL, Quan SF. 2007. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. Westchester, IL: American Academy of Sleep Medicine.

      Klaassen AL, Heiniger A, Sánchez PV, Harvey MA, Rainer G. 2021. Ventral pallidum regulates the default mode network, controlling transitions between internally and externally guided behavior. Proc Natl Acad Sci U S A 118:1–10. doi:10.1073/pnas.2103642118

      Lansink CS, Goltstein PM, Lankelma J V., McNaughton BL, Pennartz CMA. 2009. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol 7. doi:10.1371/JOURNAL.PBIO.1000173

      Luef EM, Resnik P. 2023. Phonotactic Probabilities and Sub-syllabic Segmentation in Language

      Learning. Theory Pract Second Lang Acquis 9:1–31. doi:10.31261/TAPSLA.12468

      Michelmann S, Bowman H, Hanslmayr S. 2016. The Temporal Signature of Memories: Identification of a General Mechanism for Dynamic Memory Replay in Humans. PLoS Biol 14:e1002528. doi:10.1371/journal.pbio.1002528

      Proskovec AL, Heinrichs-Graham E, Wilson TW. 2019. Load Modulates the Alpha and Beta Oscillatory Dynamics Serving Verbal Working Memory. Neuroimage 184:256. doi:10.1016/J.NEUROIMAGE.2018.09.022

      Reber AS. 1967. Implicit learning of artificial grammars. J Verbal Learning Verbal Behav 6:855–863.

      doi:10.1016/S0022-5371(67)80149-X

      Schreiner T, Rasch B. 2015. Boosting vocabulary learning by verbal cueing during sleep. Cereb Cortex 25:4169–4179. doi:10.1093/cercor/bhu139

      Sterpenich V, van Schie MKM, Catsiyannis M, Ramyead A, Perrig S, Yang H-D, Van De Ville D, Schwartz S. 2021. Reward biases spontaneous neural reactivation during sleep. Nat Commun 2021 121 12:1–11. doi:10.1038/s41467-021-24357-5

      Tamminen J, Lambon Ralph MA, Lewis PA. 2013. The role of sleep spindles and slow-wave activity in integrating new information in semantic memory. J Neurosci 33:15376–15381. doi:10.1523/JNEUROSCI.5093-12.2013

      Tamminen J, Payne JD, Stickgold R, Wamsley EJ, Gaskell MG. 2010. Sleep spindle activity is associated with the integration of new memories and existing knowledge. J Neurosci 30:14356–60. doi:10.1523/JNEUROSCI.3028-10.2010

      Zhu Y, Wang Q, Zhang L. 2021. Study of EEG characteristics while solving scientific problems with different mental effort. Sci Rep 11. doi:10.1038/S41598-021-03321-9

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This important study explores the potential influence of physiologically relevant mechanical forces on the extrusion of vesicles from C. elegans neurons. The authors provide compelling evidence to support the idea that uterine distension can induce vesicular extrusion from adjacent neurons. The work would be strengthened by using an additional construct (preferably single-copy) to demonstrate that the observed phenotypes are not unique to a single transgenic reporter. Overall, this work will be of interest to neuroscientists and investigators in the extracellular vesicle and proteostasis fields. 

      We now include supporting data using a single copy alternate fluorescent reporter expressed in touch neurons (Fig. 3H).

      In brief, we examined the induction of exophergenesis in an alternative single-copy transgene strain that expresses mKate fluorescent protein specifically in touch receptor neurons. As compared to the multi-copy transgene that is broadly used in this study and expresses mCherry fluorescent protein specifically in touch receptor neurons, the mKate single-copy transgene is associated with a much lower frequency of exophergenesis. However, increasing uterine distension via blocking egg-laying can increase the exophergenesis of the mKate single-copy transgenic line from 0% to approximately 60% on adult day 1, indicating that the observed response is not tied to a single reporter.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors sought to understand the stage-dependent regulation of exophergenesis, a process thought to contribute to promoting neuronal proteostasis in C. elegans. Focusing on the ALMR neuron, they show that the frequency of exopher production correlates with the timing of reproduction. Using many genetic tools, they dissect the requirements of this pathway to eventually find that occupancy of the uterus acts as a signal to induce exophergenesis. Interestingly, the physical proximity of neurons to the egg zone correlates with exophergenesis frequency. The authors conclude that communication between the uterus and proximal neurons occurs through the sensing of mechanic forces of expansion normally provided by egg occupancy to coordinate exophergenesis with reproduction. 

      Strengths: 

      The genetic data presented is thorough and solid, and the observation is novel. 

      Weaknesses: 

      The main weakness of the study is that the detection of exophers is based on the overexpression of a fluorescent protein in touch neurons, and it is not clear whether this process is actually stimulated in wild-type animals, or if neurons have accumulated damaged proteins in relatively young day 2 animals. 

      We now include data using a single copy alternate fluorescent reporter expressed in touch neurons. Although baseline exopher levels are low in this strain, we demonstrate that inducing egg retention in this background markedly increases exopher generation from a baseline of near zero to ~60% (new Fig. 3H), supporting that uterine distention, rather than reporter identity, is associated with early life exopher elevation. Data also add to our observations indicating that high protein-expressing strains generally produce higher baseline levels of exophers in early adulthood (for example, Melentijevic et al. (PMID 28178240) documented that mCherry RNAi knockdown in the strain primarily studied here can lower exopher levels).

      The second point raised here, regarding the occurrence and physiological role of early-adult exophers in “native” non-stressed neurons is a fascinating question that we are beginning to address in continuing experiments. Readers will appreciate that quantifying relatively rare, “invisible” touch receptor neuron exophergenesis accurately without expressing a fluorescent reporter is technically challenging. Our speculation, outlined now a bit more clearly in the Discussion here, is that certain molecular and organelle debris that cannot readily be degraded in cells during larval development may be stored until release to more capable degradative neighbors or to the coelomocytes for later management, as one component of the early adult transition in proteostasis (see J. Labbadia and R. I. Morimoto, PMID 24592319). Receiving cells may be primed for this at a particular timepoint, possibly analogous to the “bulky garbage” collection of over-sized difficult-to-dispose-of household items that a town will address with specialized action only at specific times. The prediction is that we should be able to detect some mass protein aggregation through early development, and at least partial elimination by adult day 3; this elimination should be impaired when eggs are eliminated. Initial testing is underway.

      Reviewer #2 (Public Review): 

      Summary: 

      This paper reports that mechanical stress from egg accumulation is a biological stimulus that drives the formation of extruded vesicles from the neurons of C. elegans ALMR touch neurons. Using powerful genetic experiments only readily available in the C. elegans system, the authors manipulate oocyte production, fertilization, embryo accumulation, and egg-laying behavior, providing convincing evidence that exopher production is driven by stretch-dependent feedback of fertilized, intact eggs in the adult uterus. Shifting the timing of egg production and egg laying alters the onset of observed exophers. Pharmacological manipulation of egg laying has the predicted effects, with animals retaining fewer eggs having fewer exophers and animals with increased egg accumulation having more. The authors show that egg production and accumulation have dramatic consequences for the viscera, and moving the ALMR process away from eggs prevents the formation of exophers. This effect is not unique to ALMR but is also observed in other touch neurons, with a clear bias toward neurons whose cell bodies are adjacent to the filled uterus. Embryos lacking an intact eggshell with reduced rigidity have impaired exopher production. Acute injection into the uterus to mimic the stretch that accompanies egg production causes a similar induction of exopher release. Together these results are consistent with a model where stretch caused by fertilized embryo accumulation, and not chemical signals from the eggs themselves or egg release, underlies ALMR exopher production seen in adult animals. 

      Strengths: 

      Overall, the experiments are very convincing, using a battery of RNAi and mutant approaches to distinguish direct from indirect effects. Indeed, these experiments provide a model generally for how one would methodically test different models for exopher production. The paper is well-written and easy to understand. I had been skeptical of the origin and purpose of exophers, concerned they were an artefact of imaging conditions, caused by deranged calcium activity under stressful conditions, or as evidence for impaired animal health overall. As this study addresses how and when they form in the animal using otherwise physiologically meaningful manipulations, the stage is now set to address at a cellular level how exophers like these are made and what their functions are. 

      Weaknesses: 

      Not many. The experiments are about as good as could be done. Some of the n's on the more difficult-to-work strains or experiments are comparatively low, but this is not a significant concern because of the number of different, complementary approaches used. The microinjection experiment in Figure 7 is very interesting, there are missing details that would confirm whether this is a sound experiment. 

      We expanded description of details for the microinjection experiment in both the figure legend and the methods section, to enhance clarity and substantiate approach.

      Reviewer #3 (Public Review): 

      Summary: 

      In this paper, the authors use the C. elegans system to explore how already-stressed neurons respond to additional mechanical stress. Exophers are large extracellular vesicles secreted by cells, which can contain protein aggregates and organelles. These can be a way of getting rid of cellular debris, but as they are endocytosed by other cells can also pass protein, lipid, and RNA to recipient cells. The authors find that when the uterus fills with eggs or otherwise expands, a nearby neuron (ALMR) is far more likely to secrete exophers. This paper highlights the importance of the mechanical environment in the behavior of neurons and may be relevant to the response of neurons exposed to traumatic injury. 

      Strengths: 

      The paper has a logical flow and a compelling narrative supported by crisp and clear figures. 

      The evidence that egg accumulation leads to exopher production is strong. The authors use a variety of genetic and pharmacological methods to show that increasing pressure leads to more exopher production, and reducing pressure leads to lower exopher production. For example, egg-laying defective animals, which retain eggs in the uterus, produce many more exophers, and hyperactive egg-laying is accompanied by low exopher production. The authors even inject fluid into the uterus and observe the production of exophers. 

      Weaknesses: 

      The main weakness of the paper is that it does not explore the molecular mechanism by which the mechanical signals are received or responded to by the neuron, but this could easily be the subject of a follow-up study. 

      We agree that the molecular mechanisms operative are of considerable interest, and our initial pursuit suggests that a comprehensive study will be required for satisfactory elaboration of how mechanical signals are received or responded to by the neuron.

      I was intrigued by this paper, and have many questions. I list a few below, which could be addressed in this paper or which could be the subject of follow-up studies. 

      - Why do such a low percentage of ALMR neurons produce exophers (5-20%)? Does it have to do with the variability of the proteostress? 

      We do not yet understand why some ALMR neurons within a same genotype will produce exophers and some will not. We know that in addition to the uterine occupation we report here, proteostasis compromise, feeding status, oxidative stress, and osmotic stress can elevate exopher numbers (PMID 34475208); cell autonomous influences on exopher levels include aggresome-associated biology (PMID 37488107) and expression levels of the mCherry protein (PMID 28178240). Turek reports that social interaction on plates can influence muscle exopher levels (PMID 34288362). Thus, although variable proteostress experienced by neurons is likely a factor, we have not yet experimentally defined specific trigger rules. We suspect the summation of internal proteostasis crisis and environmental conditions, including particular force vectors/frequency will underlie the variable exopher production phenomeonon.

      - Why does the production of exophers lag the peak in progeny production by 24-48 hours? Especially when the injection method produces exophers right away?

      The progeny production can track well with exopher production (Fig. 1B), although the nature of egg counts (permanent, one time events) vs. exophers (which are slowly degraded) can skew the peak scores apart. We synchronized animals at the L4 stage. 24 hours later was adult day 1, and we measured then and every subsequent 24 hours. The daily progeny count reflects the total number of progeny produced every 24 hours; exopher events were scored once a day, but exophers can persist such that the daily exopher count can partially reflect slow degradation, with some exophers being counted on two days. We now explain our scoring details better in the Methods section.

      The rapid appearance of exophers, as early as about ~10 minutes after sustained injection, is fascinating and probably holds mechanistic implications for exopher biology. For one thing, we can infer that in the mCherry Ag2 background, touch neurons can be poised to extrude exophers, but that the pressure/push acts to trigger or license final expulsion. It is interesting that we found we needed to administer sustained injection of two minutes to find exopher increase (now better emphasized in the expanded Methods section). We speculate that a multiple pressure events, or sustained force vector might be critical (like an egg slowly passing through??). Minimally, this assay may help us assign molecular roles to pathway components as we identify them moving forward. 

      - As mentioned in the discussion, it would be interesting to know if PEZO-1/PIEZO is required for uterine stretching to activate exophergenesis. pezo-1 animals accumulate crushed oocytes in the uterus. 

      We have begun to test the hypothesis that PEZO-1 is a signaling component for ALMR exophergenesis, initially using the N and C terminal pezo-1 deletion mutants as in Bai et al. (PMID 32490809). These pezo-1 mutants have a mild decrease in ALMR exophergenesis under normal conditions. However, vulva-less conditions in pezo-1N and piezo-1C increased ALMR exophergenesis from approximately 10% to 60%, similar to the response of wild-type worms to high mechanical stress, data that suggest PEZO-1 is not a required player in mediating mechanical force-induced ALMR exophergenesis. We are currently testing genetic requirements for other known mechanosensors. We intend comprehensive investigation of the molecular mechanisms of mechanical signaing in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      -The study would be significantly strengthened by the addition of data detecting regulation of exophergenesis by uterine forces in a more physiological context, in the absence of overexpression of a toxic protein. In other words, is this a process that occurs naturally during reproduction, or is it specific to proteotoxic stress induced by overexpression? Perhaps the authors could repeat key experiments using a single copy transgene, and challenge the animals with exogenous proteotoxic stress if necessary.

      We now include data using a single copy alternate fluorescent reporter expressed in touch neurons. Although baseline exopher levels are low in this strain, we demonstrate that inducing egg retention in this background markedly increases exopher generation from a baseline of near zero to ~60% (Fig. 3H), supporting that uterine distention, rather than reporter identity or over-expression alone dries early life exopher elevation.

      Also noteworthy is that we find exophergenesis in the single-copy transgenic line is only approximately 0.3% on adult day 2 (average in three trials, data not shown), which is much lower than the 5-20% exophergenesis rate typically observed in the multi-copy high expression mCherry transgenic line. Therefore, consequences of overexpression of mCherry likely potentiate exophergenesis.

      -The authors mention that exophergenesis has been described in muscle cells. Is this also dependent on the proximity to the uterus? It would have been interesting to include data on other cell types in the vicinity of the reproductive system.

      Yes, in interesting work on exophers produced by muscle, Turek et al. reported that muscle exopher events are mostly located in a region proximal to the uterus. Moreover, this work also documented that sterile hermaphrodites are associated with approximately 0% muscle exophergenesis, and egg retention in the uterus strongly increases muscle exophergenesis (PMID: 34288362).  

      -Is exophergenesis also induced by other forms of mechanical stress? For example, swimming.

      We have looked at crude treatments such as centrifugation or vortexing without observing changes in exopher levels. Our preliminary work indicates that swimming can increase exophergenesis, and this effect depends on the presence of eggs in the uterus. We appreciate the question, and expect to include documentation of alternative pressure screening in our planned future paper on molecular mechanisms.

      -In Figure 1E, the profile of exopher production for the control condition at 25oC is very similar to the profile observed at 20oC in Figure 1B. However, the profile of progeny production at 25oC is known to have an earlier peak of progeny production. Perhaps egg retention is differently correlated with progeny production at this temperature? The authors could easily test this.

      Overall, exophers (which degrade with time) and progeny counts (a fixed number) have slightly different temporal features, anchored in part by how long exophers or their “starry night” debris persist. Most exophers start to degrade within 1-6 hours (PMID: 36861960), but exopher debris can persist for more than 24 hours. An exopher event observed on day 1 may thus also be recorded at the day 2 time point, which leads to a higher frequency of exopher events on day 2 as compared to day 1.

      We have previously published on the impact of temperature on exopher number (Supplemental Figure 2 in PMID 34475208). In brief, increasing culture temperature for animals that are raised over constant lifetime temperature modestly increases exopher number; a greater increase in exophers is observed under conditions in which animals were switched to a higher temperature in adult life, suggesting changes in temperature (a mandatory part of the ts mutant studies) engages complex biology that modulates exopher production. Our previous data show that in a temperature shift to 25oC, the peak of exophers was at adult day 1. Here, Fig. 1B is constant temperature, 20oC; Fig. 1E has a temperature shift 15-25oC. That egg retention might be temperature-influenced is a plausible hypothesis, but given the complexities of temperature shifts for some mutants, we elected to defer drill-down on the temperature-exopher-egg relationship. 

      -It is not clear how to compare panels A and B in Figure 3. In panel A the males are present throughout the adult life of the hermaphrodites whereas in panel B the males are added in later life. Therefore, the effect of later-life mating on progeny production is not shown and the title of panel A in the legend is misleading. The authors need to perform a progeny count in the same conditions of mating presented in Figure 3B to allow direct comparison.

      As Reviewer 1 suggested, we performed a new progeny count now presented in new Fig. 3A, which more appropriately matches the study presented in Fig. 3B; legends adjusted.

      -On page 12, the authors state that the baseline of exophergenesis in rollers is 71%, but then attribute the 71% in Figure 4F to exophergenesis specifically in ALMR that is posterior to AVM. The authors need to clarify this point.

      Good catch on our error. The baseline of exophergenesis in rollers is ~40%, and we corrected the main text.

      -Considering the conclusion of Figure 2 that blocking embryonic events passed the 4-cell stage does not impact exopher production, it would have been interesting to compare the uterine length for emb-8 and for mex-3, since it is quite intriguing that the former suppresses exopher production while the latter has no effect.

      We repeated the emb-8 and mex-3 RNAi for these studies and encountered variability in outcome for 2 cell stage disruption via emb-8 RNAi, which is consistent with the range of published endpoints for emb-8 RNAi. We elected to include these emb-8 findings in the figure legend 2G, but removed the RNAi data from the main text figure. mex-3 uterine measures are added to revised panels 5H, 6I.

      Reviewer #2 (Recommendations For The Authors): 

      -Leaving the worms in halocarbon oil for too long (e.g. 10 min) can desiccate and kill them. Did the authors take them out of the oil before analyzing exopher production? The authors refer to these as 'sustained injections' without much description beyond that. As the worms are very small, the flow rate needed for a sustained injection over 2 minutes must be very low - so low that the needle is in danger of being clogged. Do the authors have an estimate of how much fluid was injected or the overall flow rate? I realize the flow rate measured outside of the worm may not compare directly to that of a pressurized worm, but such estimates would be instructive, particularly if they can be related to the relative volume of the eggs the injection is trying to mimic.

      After injection or mock injection, we removed the animal from the oil and flipped it if necessary to observe the ALMR neuron on the NGM-agar plate. We now expanded description of the experimental details of injection, including the estimated flow rate, in the revised Methods section.

      - The authors describe the ALMR neurons as "proteostressed", but I am not clear on whether these neurons were treated in a unique procedure to induce such a state or if the authors are merely building on other observations that egg-laying adults are dedicating significant resources to egg production, so they must be proteostressed. If they are not inducing a proteostressed state in their experiments, the authors should refrain from describing their neurons and effects as depending on such a state.

      We revised to more explicity feature published evidence that the ALMR neurons we track with mCherryAg2 bz166 are likely protestressed. Overexpression of mCherry in bz166 is associated with enlargement of lysosomes and formation of large mCherry foci that often correspond toe LAMP::GFP-positive structures in ALMR neurons (PMID: 28178240; PMID: 37488107). Marked changes in ultrastructure reflect TN stress in this background. These cellular features are not seen in wild type animals. We previously published that mCherry, polyQ74, polyQ128, Ab1-42 (which enhance proteostress) over-expression all increase exophers (PMID: 28178240). Likewise most genetic compromise of different proteostasis branches--heat shock chaperones, proteasome and autophagy--promote exophergenesis, supporting exophergenesis as a response to proteostress. In sum, the mCherryAg2 bz166 appear markedly stressed above a non-over expressing line and produce more exophers. RNAi knockdown of the mCherry lowers exopher levels (PMID: 28178240).

      In response to reviewer comment, we added a study with a single copy mKate reporter (new data Fig. 3H). We find a very low baseline of exophers in this background. This would support that high autonomous compromise associated with over-expression influences exopher levels. Interestingly, however, we found that ALMR neurons expressing mKate under a single-copy transgene still exhibit excessive exopher production (>60%) under high mechanical stress (Fig. 3H). These data are consistent with ideas that mechanical stresses can enhance exopher production, and may markedly lower the threshold for exophergenesis in close-to-native stress level neurons.

      - The authors should include more details on the source and use of the RNAi, for example, if the clones were from the Ahringer RNAi library, made anew for this study, or both.

      We now add this information in the methods section.

      - I would be curious if the authors would similarly see an induction in exopher production after acute vulval muscle silencing with histamine. I'm not suggesting this experiment, but it may offer a way to induce exophers in a more controlled manner.

      This is a great suggestion that we will try in future studies.

      - I am not sure if Figure 5 needs to be a main figure in the paper or if it would be more appropriate as a supplement.

      We considered this suggestion but we think that the strikingly strong correleation of uterus length and exopher levels is a major point of the story and these data establish a metric that we will use moving forward to distinquish whethere an exopher modulation disruption is more likely to act by modulation of reproduction or modulation of touch neuron biology. For this reason we elected to keep Figure 5 in the main text.

      Reviewer #3 (Recommendations For The Authors): 

      -The Statistics section in the methods should be expanded to describe the statistics used in the experiments that aren't nominal, of which there are many.

      We have updated and expanded the statistics section.

      -P.2 Line 49 spelling 'que' should be queue (I remember this by the useless queue of letters lined up after the 'q').

      Corrected 

      -The introduction has a bit too much information about oocyte maturation, not relevant to the study.

      We agree that the information about oocyte maturation is not critical for the laying out the related experiments and cut this section to improve focus.

      -p.3 line 22: Some exophers are seen on Day 3, so this should be restated for accuracy.

      Corrected

      -p.3 line 26. Explain here why sperm is necessary (ooyctes don't mature or ovulate effectively without sperm).

      We added this clarifying explanation.

      -p.3 line 44 Clarify in the spe-44 the oocytes are in the oviduct (not the uterus). Might be helpful to include a DIC image to accompany the helpful diagram in Figure 1D. 

      We added a sentence describing the impact of sperm absence on oocyte maturation, progression into the uterus, and retention in the gonad, with reference to PMID: 17472754.  We were able to add a DIC in the tightly packed Figure 1.

      In Supplemental Figure 6, we now include a field picture of oocyte retention in the sem-2 mutant and upon treatment of lin-39(RNAi).

      -p.5 line 3 in the Figure 1D legend; recommend delete 'light with' which is confusing and just refer to the sperm as dark dots. 

      Corrected

      -p.6 line 22-24 Check for alignment of the statements with Figure 2 (2F is cited, but it should be 2G).

      Corrected

      -p12 line 13-15; Many ALMRs not in the egg zone (70%) did not produce exophers - this is still quite a lot. It would be good to state this section in a more straightforward way (less leading the reader) and if possible to give a possible explanation.

      We modified the text to be less leading: “Thus, although ALMR soma positioning in the egg zone does not guarantee exophergenesis in the mCherryAg2 strain, the neurons that did make exophers were nearly always in the egg zone.”

      -p.15 paragraph 3 - clarify how uterine length was controlled for the overall body length of the worm.

      We did not systematically measure body length, but rather focused on uterine distention. It would be of interest to determine if length of the body correlates with uterine size, and then address how that relationship translates to exopher production but here our attention came to rest on the striking correlation of uterine length and number of exophers.

      -p.17 line 23-25; Could be stated more simply. 

      We adjusted the text: “Moreover, the oocyte retention was similarly efficacious in elevating exopher production to egg retention, increasing ALMR exophergenesis to approximately 80% in the sem-2(rf) mutant (Fig. 6C)”.

      -p.23 Line 4. I think by the time the reader reaches this sentence, the egg-coincident exophorgenesis will not be 'puzzling'. 

      Agreed, corrected.

      -p.26, Line 22, Male 'mating', not 'matting'.

      Corrected.

      -Throughout, leave space between number and unit (this is not required for degree or percent, but be consistent). 

      Corrected.

    1. Author Response:

      We thank the reviewers for their careful reading of the manuscript and for their comments. Generally, we agree with the reviewers on the strengths and weaknesses of our manuscript. It is true that this work is a first step towards understanding the molecular mechanisms underlying TNT formation, and that further biochemical and biophysical analyses will be necessary to elucidate CD9 and CD81 roles. It also provides a toolbox for the future identification of important TNT factors, and perhaps biological markers.

      However, we would like to better explain our choice of focusing on CD9 and CD81 in TNTs, given the fact that they are also expressed in EVPs. First, both were among the most abundant integral membrane proteins in TNTs, and overexpression of CD9 was previously shown to increase TNT number. However, a recent work directed by our coauthor E. Rubinstein clearly showed that the absence of CD9, CD81 or even both has minimal impact on the production or composition of EVs in MCF7 (Fan et al, Differential proteomics argues against a general role for CD9, CD81 or CD63 in the sorting of proteins into extracellular vesicles, J. Extracell Vesicles, 2023;12:12352. https://doi.org/10.1002/jev2.12352). This is in line with another recent publication (Tognoli, Commun biol 2023) and with our results showing that the concentration of EVPs was the same when CD9 was overexpressed, i.e. in conditions where TNT number and vesicle transfer were increased. Therefore, it is highly probable that the role of CD9 and CD81 in TNT vs. EVP formation is different, even if we cannot completely exclude a crosstalk between the two pathways.

      Regarding the importance of CD9 and CD81 in TNT formation, our results are consistent with a non-exclusive regulation of the TNTs by these tetraspanins, and/or with partial compensatory mechanisms occurring in the absence of them by yet unknown factors. Interestingly, to our knowledge, none of the TNT regulators described in the literature has a complete inhibitory effect when KO. These results confirm that several pathways can converge to regulate TNTs and are consistent with cellular plasticity. So it is hard to say whether factors like CD9 and CD81, which regulate TNTs and have other functions in cells, are “key” or simply “important”.

      Finally, the model we present in Figure 7 is a schematic working model of possible CD9/CD81 roles, which is obviously simplified for ease of understanding. It is important to note that when we write “no TNT” above an empty space between 2 cells, this describes what is drawn, and corresponds to real conditions where fewer TNTs are detected. It was never our intention to over-interpret our data, but rather to make it clearer with this diagram, and we hope that reading the article will make this clear.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript describes a series of experiments using human intracranial neural recordings designed to evaluate the processing of self-generated speech in the setting of feedback delays. Specifically, the authors aim to address the question about the relationship between speech-induced suppression and feedback sensitivity in the auditory cortex, whose relationship has been conflicting in the literature. They found a correlation between speech suppression and feedback delay sensitivity, suggesting a common process. Additional controls were done for possible forward suppression/adaptation, as well as controlling for other confounds due to amplification, etc.

      Strengths:

      The primary strength of the manuscript is the use of human intracranial recording, which is a valuable resource and gives better spatial and temporal resolution than many other approaches. The use of delayed auditory feedback is also novel and has seen less attention than other forms of shifted feedback during vocalization. Analyses are robust, and include demonstrating a scaling of neural activity with the degree of feedback delay, and more robust evidence for error encoding than simply using a single feedback perturbation.

      Weaknesses:

      Some of the analyses performed differ from those used in past work, which limits the ability to directly compare the results. Notably, past work has compared feedback effects between production and listening, which was not done here. There were also some unusual effects in the data, such as increased activity with no feedback delay when wearing headphones, that the authors attempted to control for with additional experiments, but remain unclear. Confounds by behavioral results of delayed feedback are also unclear.

      Overall the work is well done and clearly explained. The manuscript addresses an area of some controversy and does so in a rigorous fashion, namely the correlation between speech-induced suppression and feedback sensitivity (or lack thereof). While the data presented overlaps that collected and used for a previous paper, this is expected given the rare commodity these neural recordings represent. Contrasting these results to previous ones using pitch-shifted feedback should spawn additional discussion and research, including verification of the previous finding, looking at how the brain encodes feedback during speech over multiple acoustic dimensions, and how this information can be used in speech motor control.

      We thank the reviewer for their comments and have addressed the concerns point by point in the section “Recommendation for Authors”.

      Reviewer #2 (Public Review):

      Summary:

      "Speech-induced suppression and vocal feedback sensitivity in human cortex", Ozker and colleagues use intracranial EEG to understand audiomotor feedback during speech production using a speech production and delayed auditory feedback task. The purpose of the paper is to understand where and how speaker-induced suppression occurs, and whether this suppression might be related to feedback monitoring. First, they identified sites that showed auditory suppression during speech production using a single-word auditory repetition task and a visual reading task, then observed whether and how these electrodes show sensitivity to auditory feedback using a DAF paradigm. The stimuli were single words played auditorily or shown visually and repeated or read aloud by the participant. Neural data were recorded from regular- and high-density grids from the left and right hemispheres. The main findings were:

      • Speaker-induced suppression is strongest in the STG and MTG, and enhancement is generally seen in frontal/motor areas except for small regions of interest in the dorsal sensorimotor cortex and IFG, which can also show suppression.<br /> • Delayed auditory feedback, even when simultaneous, induces larger response amplitudes compared to the typical auditory word repetition and visual reading tasks. The authors presume this may be due to the effort and attention required to perform the DAF task.

      • The degree of speaker-induced suppression is correlated with sensitivity to delayed auditory feedback. • pSTG (behind TTS) is more strongly modulated by DAF than mid-anterior STG

      Strengths:

      Overall, I found the manuscript to be clear, the methodology and statistics to be solid, and the findings mostly quite robust. The large number of participants with high-density coverage over both the left and right lateral hemispheres allows for a greater dissection of the topography of speaker-induced suppression and changes due to audiomotor feedback. The tasks were well-designed and controlled for repetition suppression and other potential caveats.

      Weaknesses:

      (1) In Figure 1D, it would make more sense to align the results to the onset of articulation rather than the onset of the auditory or visual cue, since the point is to show that the responses during articulation are relatively similar. In this form, the more obvious difference is that there is an auditory response to the auditory stimulus, and none to the visual, which is expected, but not what I think the authors want to convey.

      We agree with the reviewer. We have updated Figure 1 accordingly.

      (2) The DAF paradigm includes playing auditory feedback at 0, 50, 100, and 200 ms lag, and it is expected that some of these lags are more likely to induce dysfluencies than others. It would be helpful to include some analysis of whether the degree of suppression or enhancement varies by performance on the task, since some participants may find some lags more interfering than others.

      We thank the reviewer for this suggestion. In the original analysis, we calculated a Sensitivity Index for each electrode by correlating the high gamma response with the delay condition across trials. To address the reviewer’s question, we now compared delay conditions in pairs (DAF0 vs DAF50, DAF0 vs DAF100, DAF0 vs DAF200, DAF50 vs DAF100, DAF50 vs DAF200 and DAF100 vs DAF200).

      Similar to our Suppression Index calculation, where we compared neural response to listening and speaking conditions (Listen-Speak/Listen+Speak), we now calculated the Sensitivity Index by comparing neural response to two delay conditions as follows:

      e.g.  Sensitivity Index = (DAF50 – DAF0) / (DAF50 + DAF0). We used the raw high gamma broadband signal power instead of percent signal change to ensure that the Sensitivity Index values varied between -1 to 1.

      As shown in the figure below, even when we break down the analysis by feedback delay, we still find a significant association between suppression and sensitivity (except for when we calculate sensitivity indices by comparing DAF50 and DAF100). Strongest correlation (Pearson’s correlation) was found when sensitivity indices were calculated by comparing DAF0 and DAF200.

      As the reviewer suggested, participants found DAF200 more interfering than the others and slowed down their speech the most (Articulation duration; DAF0: 0.698, DAF50: 0.726, DAF100: 0.737, and DAF200: 0.749 milliseconds; Ozker, Doyle et al. 2022).

      Author response image 1.

      (3) Figure 3 shows data from only two electrodes from one patient. An analysis of how amplitude changes as a function of the lag across all of the participants who performed this task would be helpful to see how replicable these patterns of activity are across patients. Is sensitivity to DAF always seen as a change in amplitude, or are there ever changes in latency as well? The analysis in Figure 4 gets at which electrodes are sensitive to DAF but does not give a sense of whether the temporal profile is similar to those shown in Figure 3.

      In Figure 4A, electrodes from all participants are color-coded to reflect the correlation between neural response amplitude and auditory feedback delay. A majority of auditory electrodes in the STG exhibit a positive correlation, indicating that response amplitude increases with increasing feedback delays. To demonstrate the replicability of the response patterns in Figure 3, here we show auditory responses averaged across 23 STG electrodes from 6 participants.

      Author response image 2.

      Response latency in auditory regions also increases with increasing auditory feedback delays. But this delayed auditory response to delayed auditory feedback is expected. In Figure 3, signals were aligned to the perceived auditory feedback onset, therefore we don’t see the latency differences. Below we replotted the same responses by aligning the signal to the onset of articulation. It is now clearer that responses are delayed as the auditory feedback delay increases. This is because participants start speaking at time=0, but they hear their voice with a lag so the response onset in these auditory regions are delayed.

      According to models of speech production, when there is a mismatch between expected and perceived auditory feedback, the auditory cortex encodes this mismatch with an enhanced response, reflecting an error signal. Therefore, we referred to changes in response amplitude as a measure of sensitivity to DAF.

      (4) While the sensitivity index helps to show whether increasing amounts of feedback delay are correlated with increased response enhancement, it is not sensitive to nonlinear changes as a function of feedback delay, and it is not clear from Figure 3 or 4 whether such relationships exist. A deeper investigation into the response types observed during DAF would help to clarify whether this is truly a linear relationship, dependent on behavioral errors, or something else.

      We compared responses to delay conditions in pairs in the analysis presented above (response #2). We hope these new results also clarifies this issue and address the reviewer’s concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) While the correlation between SuppI and SensI is clear here (as opposed to Chang et al), it is unclear if this difference is a byproduct of how SensI was calculated (and not just different tasks). In that paper, the feedback sensitivity was calculated as a metric comparing feedback responses during production and listening, whereas here the SensI is a correlation coefficient during production only. If the data exists, it would be very helpful to also show an analysis similar to that used previously (i.e. comparing DAF effects in both production and playback, either in correlations or just the 200ms delay response). One could imagine that some differences are due to sensory properties, though it is certainly less clear what delay effects would be on listening compared to say pitch shift.

      We thank the reviewer for pointing this out. Indeed, the calculation of SensI is different in the two studies. In Chang et al. study, SensI was calculated by comparing perturbed feedback responses during production and passive listening. This is a very meticulous approach as it controls for the acoustic properties of the auditory stimuli under both conditions.

      In our study, we didn’t have a passive listening condition. This would require recording the participants’ voice as they were speaking with DAF and playing it back to them in a subsequent passive listening condition. Therefore, we can’t completely eliminate the possibility that some differences are due to sensory properties. However, to address the reviewer’s concern, we examined the voice recordings of 8 participants for acoustic differences. Specifically, we compared voice intensities for different auditory feedback delays (0,50,100 and 200ms) and found no significant differences (F=0, p=0.091).

      We think that the difference with the Chang et al. study is an important point to emphasize, therefore we now added in the Discussion:

      “In contrast, to replicate this finding in humans, a previous iEEG study by Chang et al. (Chang, Niziolek et al. 2013) used frequency-shifted feedback during vowel production and found that most suppressed auditory sites did not overlap with those sensitive to feedback alterations. Using DAF instead of frequency-shifted feedback, we demonstrated a significant overlap of two neural populations in the STG, along with a strong correlation between the degree of speech-induced suppression and sensitivity to auditory feedback. This discrepancy may be due to different methods of calculating sensitivity to altered feedback. In our study, sensitivity was determined by comparing responses to delayed and non-delayed feedback during production, whereas Chang et al. compared perturbed feedback responses during production and listening. One possibility is that our approach identifies a larger auditory neural population in the STG sensitive to altered feedback. Alternatively, it could indicate a larger population highly sensitive to temporal rather than spectral perturbations in auditory feedback. Thus, we observe a wide overlap of the two neural populations in the STG showing both speech-induced suppression and sensitivity to auditory feedback. Replaying a recording of the participants' own delayed voice back to them, which we were unable to complete in this study, would have made the results of the two studies more comparable while also completely eliminating the possibility of a sensory explanation for the observed response enhancement.”

      (2) I am still a bit unclear on how Experiment 4 is different than the no-delay condition in Experiment 3. Please clarify. Also, to be clear, in Experiments 1+2 the subjects were not wearing any headphones and had no additional sidetone?

      It is correct that participants were not wearing earphones in Experiments 1&2 (with no additional sidetone), and that they were wearing earphones in Experiments 3&4.

      For the “no delay” condition in the DAF experiment (Experiment 3), participants were wearing earphones and reading words with simultaneous auditory feedback. So, this condition was equivalent to visual word reading (Experiment 2), except participants were wearing earphones. Yet, neural responses were much larger for the “no delay” condition in the DAF experiment compared to visual word reading.

      We suspected that larger neural responses in the DAF experiment were caused by hearing auditory feedback through earphones. To test and control for this possibility, in a subset of participants, we ran an additional visual word reading experiment (Experiment 4) with earphones and used the same volume settings as in the DAF experiment. We found that response magnitudes were now similar in the two experiments (Experiment 3 and 4) and earphones (with the associated increased sound amplitude) were indeed the reason for larger neural responses. Thus, Experiment 4 differs from the no-delay condition in Experiment 3 only in the stimuli read aloud.

      (3) In Figure 3, why is the DAF200 condition activity so much bigger than the other conditions, even prior to the DAF onset? I worry this might bias the rest of the response differences.

      In Figure 3B and 3D, time=0 indicates the onset of the perceived auditory feedback. Below we replotted the responses in the same two electrodes but now time=0 indicates the onset of articulation. We see that the peaking time of the responses are delayed as the auditory feedback delay increases. This is because participants start speaking at time=0, but they hear their voice with a lag so the response onset in these auditory regions are delayed. However, like the reviewer pointed out, the response for the DAF200 condition in Electrode G54 is slightly larger even at the very beginning. We think that this small, early response might reflect a response to the bone-conducted auditory feedback, which might be more prominent for the DAF200 condition. Nevertheless, we still see that response amplitude increase with increasing feedback delays in Electrode 63.

      (4) Figure 4C, are the labeled recording sites limited to those with significant DAF and/or suppression?

      In Figure 4C, we show electrodes that had significant high-gamma broadband responses during all tasks. We write in the Methods: “Electrodes that showed significant response increase (p < 10−4) either before (−0.5 to 0 s) or after speech onset (0 to 0.5 s) with respect to a baseline period (−1 to −0.6 s) and at the same time had a large signal-to-noise ratio (μ/σ > 0.7) during either of these time windows were selected. Electrode selection was first performed for each task separately, then electrodes that were commonly selected were further analyzed.”

      (5) Were there any analyses done to control for the effects of vocal changes on the DAF neural responses? The authors' previous paper did note a behavioral effect. This is probably not trivial, as we may not know the 'onset time' of the response, in contrast to pitch shift where it is more regular. If the timing is unknown, one thing that could be tried is to only look early in DAF responses (first 50ms say) to make sure the DAF effects hold.

      DAF involves two different perturbations: the absence of feedback at speech onset and the introduction of delayed feedback during playback. The timing of the behavioral effect in response to these two perturbations remains unclear. Aligning the neural responses to the production onset and examining the first 50ms would only capture the response to the acoustic feedback for the no-delay condition within that time window. Conversely, aligning the responses to the playback onset might miss the onset of the behavioral effect, which likely starts earlier as a response to the lack of feedback. We acknowledge the reviewer's point that this is a limitation of the DAF paradigm, and the behavioral effect is not as straightforward as that of pitch perturbation. However, we believe there is no clear solution to this issue.

      Minor points:

      (1) Figure 3, it might be nice to show the SuppI and SensI on the plots to give the reader a better sense of what those values look like.

      We included SuppI and SensI values in the new version of Figure 3.

      Reviewer #2 (Recommendations For The Authors):

      Minor Comments:

      (1) In Figure 1, it is unclear whether the responses shown in B-D correspond to the ROIs shown in Figure A - I am guessing so, but the alignment of the labels makes this slightly unclear, so I suggest these be relabeled somehow for clarity.

      This is fixed in the updated version of Figure 1.

      (2) In Figure 1D the difference in colors between AWR and VWR is difficult to appreciate - I suggest using two contrasting colors.

      This is fixed in the updated version of Figure 1.

      (3) Please add y-axis labels for Fig 3B-D. (I believe these are % signal change, but it would be clearer if the label were included).

      This is fixed in the updated version of Figure 3.

      (4) Can the authors comment on whether the use of speakers for AWR and VWR versus earphones for DAF and VWF- AF may have had an influence on the increased response in this condition? If the AWR were rerun using the headphone setup, or if DAF with 0 ms feedback were run with no other trials including lags, would the large differences in response amplitude be observed?

      Participants were not wearing earphones in Experiments 1&2, and that they were wearing earphones in Experiments 3&4.

      For the “no delay” condition in the DAF experiment (Experiment 3), participants were wearing earphones and reading words with simultaneous auditory feedback. So, this condition was equivalent to VWR (Experiment 2), except participants were wearing earphones. Yet, neural responses were much larger for the “no delay” condition in the DAF experiment compared to VWR.

      Supporting the reviewer’s concerns, we suspected that larger neural responses in the DAF experiment were caused by hearing auditory feedback through earphones. To test and control for this possibility, in a subset of participants, we ran the VWR-AF experiment (Experiment 4) with earphones and used the same volume settings as in the DAF experiment. We found that response magnitudes were now similar in the two experiments (Experiment 3 and 4) and earphones were indeed the reason for larger neural responses.

      (5) No data or code were available, I did not see any statement about this nor any github link or OSF link to share their data and/or code.

      Data is available in the Github repository: flinkerlab/Sensitivity-Suppression

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      We thank the reviewer for the time and effort in reviewing our revised manuscript and are grateful for their constructive comments and for acknowledging the significance of our work.

      Summary: 

      Their findings elucidate the mechanisms underlying 2-AA-mediated reduction of pyruvate transport into mitochondria, which impairs the interaction between ERRα and PGC1α, consequently suppressing MPC1 expression and reducing ATP production in tolerized macrophages. While the data presented is intriguing and the paper is well-written, there are several points that warrant consideration. The authors should enhance the clarity, relevance, and impact of their study. 

      Strengths: 

      This paper presents a novel discovery regarding the mechanisms through which PA regulates the bioenergetics of tolerized macrophages. 

      Weaknesses: 

      The relevance of the in vivo model to support the conclusions is questionable. Further clarification is needed on this point. 

      We appreciate the reviewer’s comment. Our conclusion that 2-AA decreases bioenergetics while sustains bacterial burden is further supported by additional in vivo data we present now in Fig. S5. To strengthen the relevance of our in vivo data, we performed additional in vivo experiments. In this set of in vivo studies, mice received the first exposure to 2-AA by injecting 2-AA only and the 2nd exposure through infection with PA14 or ΔmvfR four days post-2-AA injection.  As shown in the supplementary Figure S5 the levels of ATP and acetyl-CoA in the spleen of infected animals and the enumeration of the bacterial counts were the similar between PA14 or ΔmvfR receiving the 1st 2-AA exposure and agree with the “one-shot infection” findings presented in Figure 5 with the PA14 or ΔmvfR+2-AA infected mice or those receiving 2-AA only. These results are consistent with our previous findings showing that 2-AA impedes the clearance of PA14 (Bandyopadhaya et al. 2012; Bandyopadhaya et al. 2016; Tzika et al. 2013) and provide compelling evidence that the metabolic alterations identified may favor PA persistence in infected tissues.

      Reviewer #2 (Public Review): 

      We thank the reviewer for the time and effort in reviewing our revised manuscript and are grateful for their constructive comments and for acknowledging the significance of our work.

      Summary: 

      The study tries to connect energy metabolism with immune tolerance during bacterial infection. The mechanism details the role of pyruvate transporter expression via ERRalpha-PGC1 axis, resulting in pro-inflammatory TNF alpha signalling responsible for acquired infection tolerance. 

      Strengths: 

      Overall, the study is an excellent addition to the role of energy metabolism during bacterial infection. The mechanism-based approach in dissecting the roles of metabolic coactivator, transcription factor, mitochondrial transporter, and pro-inflammatory cytokine during acquired tolerance towards infections indicates a detailed and well-written study. The in vivo studies in mice nicely corroborate with the cell line-based data, indicating the requirement for further studies in human infections with another bacterial model system. 

      Weaknesses:

      The authors have involved various mechanisms to justify their findings. However, they have missed out on certain aspects which connect the mechanism throughout the paper. For example, they measured ATP and acetyl COA production linked with bacterial re-exposures and added various targets like MCP1, EER alpha, PGC1 alpha, and TNF alpha. However, they skipped PGC1 alpha levels, ATP and acetyl COA in various parts of the paper. Including the details would make the work more comprehensive. 

      We appreciate the reviewer’s comments and apologize for omitting the PGC-1α levels.  Per the reviewer’s suggestion, we have added the PGC-1α transcript levels (Figure 4C) in the section describing 2-AA-mediated dysregulation of the ERRα and MPC1 transcription (lines 243-252). Moreover, we have added Figure S5, which shows additional ATP and acetyl CoA levels in vivo. In our view, ATP and acetyl-CoA levels are shown in all appropriate settings, interrogating the bioenergetics, including in the presence of bacteria and in their absence, where only 2-AA is added. Please see Figures 1 and 5 and the newly added Figure S5.

      The use of public data sets to support their claim on immune tolerance is missing. Including various data sets of similar studies will strengthen the findings independently. 

      Suppose we understand correctly the reviewer’s comment regarding public data sets on immune tolerance. In that case, we are referring to our data since there are no published data from other groups on 2-AA tolerization and because the outcome of the 2-AA effect on the bacterial burden differs from that of LPS. Therefore, this study did not consider comparing with published data from LPS.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Animal model: The authors appropriately initiated the study with an in vitro tolerization model involving 2-AA re-exposure, providing foundational insights for further investigation. However, the rationale for the one-shot injection in the in vivo model lacks clarity. To strengthen the relevance of the in vivo data, the authors should consider establishing a model involving bacterial re-exposure, such as a two-challenge paradigm with antibiotic treatment in between. This approach would allow for the examination of peritoneal macrophages harvested from mice, assessing ATP levels, acetyl CoA, TNF production, and bacterial counts. Such an approach would better align the in vivo findings with the in vitro experiments, confirming the role of tolerized macrophages in controlling PA infection in the presence of 2-AA. 

      We thank the reviewer for this comment.  Indeed, we have performed a similar two-challenge paradigm study in which first exposure to 2-AA is achieved by injecting 2-AA, and 2nd exposure through infection with PA14 or ΔmvfR four days post -2-AA injection.  The results of Figure S5 can be directly compared with those in Fig 5 in vivo studies. As shown in supplementary Figure S5 the levels of ATP and acetyl-CoA in the spleen of infected animals and the enumeration of the bacterial counts agree with the “one-shot infection” presented in Fig 5 (PA14 or ΔmvfR+2-AA).  Figure S5 study although not included initially to simplify data presentation, it was performed in parallel with Fig 5 and thus they can be directly compared. 

      (2) Exogenous ATP treatment: It is crucial to explore whether 2-AA re-exposure suppresses inflammasome activation and whether this suppression can be reversed by exogenous ATP treatment. Specifically, the authors should investigate whether NLRP3 inflammasome activation is inhibited in tolerized macrophages and whether such activation is necessary for host defense. Clarifying these points would provide valuable insights into the mechanisms underlying macrophage tolerization induced by 2-AA. 

      Excellent point. We agree, indeed, this is planned in the near future.

      (3) Figures 4C and D: The authors should exercise care in describing these figures. For instance, line 263 states that "UK5099 had no effect on the PA14 burden in macrophages," which requires correction for accuracy. 

      We apologize and rephrase this sentence and other sentences referring to Fig 4D and 4E in this section. Please see the highlighted sentences in the results section referring to Fig 4. For example, “The addition of the UK5099 inhibitor strongly enhanced the bacterial intracellular burden in ΔmvfR infected macrophages compared to the non-inhibited ΔmvfR infected cells, reaching a similar burden to those infected with PA14 (Fig. 4D)”.

      (4) ERRα expression: While the study intriguingly demonstrates a decrease in ERRα levels in tolerized macrophages following exposure to 2-AA, the discussion of this finding is lacking. It is worth exploring the possibility of increasing ERRα expression to counteract the tolerization induced by 2-AA and enhance clearance of PA infection. This avenue should be thoroughly discussed in the manuscript's Discussion section, offering insights into potential therapeutic strategies to mitigate the effects of 2-AA on macrophage function. 

      Thank you so much for this additional comment.  We have now included this point in the discussion section (lines 373-376).

      Reviewer #2 (Recommendations For The Authors): 

      Overall, the study is an excellent addition to the role of energy metabolism during bacterial infection. The mechanism-based approach in dissecting the roles of metabolic coactivator, transcription factor, mitochondrial transporter, and pro-inflammatory cytokine during acquired tolerance indicates a detailed and well-written study. However, connecting the mechanisms often was not reflected in some of the experiments, and answering a few concerns/suggestions will undoubtedly improve the study's readability, appeal, and overall impact on a broader audience. 

      (1) The authors should rephrase the title if possible. The title indicates 2AA as a bacterial quorum sensing signal; however, throughout the manuscript, there are no studies associated with actual quorum sensing in bacteria. 

      Thank you for this comment. However, the title indicates 2-AA as a quorum sensing molecule because the synthesis of this signaling molecule is uniquely regulated by quorum sensing. Because of its importance in the virulence of Pseudomonas aeruginosa and its regulation by quorum sensing, we feel that it is appropriate to refer to it as such.

      (2) The authors generalised immunotolerance and memory of 2AA-exposed cells to broad-spectrum microbial exposure by just testing with LPS exposure. I would suggest they test at least 2 more heterologous microbial products known to illicit response and confirm their claim from Figure 1. 

      We appreciate the reviewer’s comment. We intend not to generalize immunotolerance and memory of 2-AA exposed cells to broad-spectrum microbial exposure. Moreover, since the manuscript is not focused on comparing other bacterial molecules to 2-AA and multiple studies have focused on LPS tolerance, we tested LPS only in the manuscript.

      (3) LPS triggers ATP production through glycolysis in nitric oxide (NO) dependent mechanisms in various immune and non-immune cells. The authors should study the concentrations of NO, Glucose, and Pyruvate levels to clarify the mechanism of energy dynamics and the source of ATP and Acetyl CoA generated/scavenged during primary and secondary exposures to both 2AA and LPS. 

      We agree that a cross-tolerization experiment using 2-AA and LPS would reveal interesting insights into immune response during PA infections.  However, this is out of the scope of this article. Please notice that the mechanism of 2-AA and LPS tolerization is mechanistically distinct, e.g. they rely on different HDAC enzymes, and LPS tolerization predominantly involves changes in H3K27 acetylation (Lauterbach et al. 2019). In contrast, 2-AA tolerization involves H3K18 modifications (Bandyopadhaya, Tsurumi, and Rahme 2017). For this reason, the complexity of such interactions would require a comprehensive set of experiments that are not part of the focus of this study.

      (4) Immunogenic triggers often rapidly alter mitochondrial membrane potential, which alters oxygen consumption rates. However, the authors tend to generalize energy homeostasis and claim the deregulation of OXPHOS-inducing quiescent phenotype depending upon OCR measurements from Figure 1D. The authors must evaluate mitochondrial health and membrane potential during first and second exposure in a time-dependent manner to strengthen their theory of mitochondrial dysfunction. The authors should also check the phenomena in vivo (mice exposed to infection) if possible. 

      Thank you for this suggestion. We now include electron microscopy images of mitochondria isolated from macrophages exposed to 2-AA. Results revealed that 2-AA alters mitochondrial morphology and cristae, supporting the mitochondrial dysfunctionality caused by 2-AA. These results are shown in Figure S4 and lines 185-188.

      (5) Since both MCP1 and MCP2 transporters are known to transport pyruvate to mitochondria, checking both MCP1 and 2 at transcript and protein levels in exposed cells will be essential. I suggest authors use MCP inhibitors or use RNA interference against MCPs to check the effect on tolerance of the cells exposed for a second time. 

      To our understanding, mitochondrial pyruvate carrier proteins, MPC1 and MPC2, form a hetero-oligomeric complex in the inner mitochondrial membrane to facilitate pyruvate import into mitochondria (McCommis and Finck 2015). We also used UK5099 an MPC carrier inhibitor for enumeration of bacterial load in macrophages in Figure 4 and observed a similar effect as 2-AA suggesting a similar mechanism of action.

      (6) The pyruvate levels of mitochondria in Figure 2A are shallow, and the authors claim statistical significance within a 1.5-fold change. The authors should cross-check the number of mitochondria they are isolating while estimating pyruvate from only mitochondrial fractions. Another point is, correlating mitochondrial pyruvate with the burst of ATP during first exposure in comparison to second exposure, one can argue that the number of mitochondria is variable between the exposures leading to a change in pyruvate amount (mitochondria number increases to compensate for the first exposure and decreases quickly to maintain homeostasis and remains quiescent during a second exposure due to activation of compensatory immune mechanism towards primary exposure). How do authors address the issue? 

      Our electron microscopic studies indicate that although after 2-AA exposure, no reduction in mitochondrial numbers is observed in macrophages, alterations in mitochondrial morphology and cristae are observed. Please also see our answer to point # 4.

      (7) The authors claim that ERR alpha regulates MCP1 transcription via activation of ERRalpha-PGC1 alpha axis and tolerization in cells to second exposure is due to impairment of the axis (Figure 3). PGC1 alpha is known to be induced during various metabolic, physiological, and immune-challenge-related stress in a tissue-dependent manner. In this context, one should expect changes in transcript and protein levels of PGC1 alpha. The authors must study PGC1 alpha levels with time-dependent exposures. LPS was shown to induce oscillations in PGC1 alpha levels in a tissue-specific manner. In experiments, authors should verify if such oscillations persist during time-dependent exposure, emphasising mitochondrial uncoupling that might get dampened during re-exposures to microbial challenges. 

      We appreciate the suggestion. We have now included PGC-1α (Figure 4C) transcript levels, which show the same profile as the transcript levels of ERRα and MPC1. Please note that PGC-1α is only one of several ERRα co-activators; therefore, the amount of ERRα protein is the most relevant assessment regarding the activation of the MPC1 transcription.

      (8) The authors claim that ERRalpha induces MCP1 through ChIP data in Figure 3. However, the physical verifications at mRNA levels and mutational/inhibitor-based experiments are missing. The authors should study the alterations of MCP1 mRNA in relation to exposures and inhibitors of ERRalpha and PGC1 alpha to strengthen their work. 

      This is an interesting approach; however, this experiment exceeds the scope of our manuscript. We will certainly consider this suggestion in our future experiments. Thank you.

      (9) Publicly available data sets with LPS exposures should be analyzed for gene sets pertaining to mitochondrial OXPHOS, metabolism, immune response, etc. This will support the authors' work and provide a global overview of transcriptome associated with immune tolerance. 

      We appreciate the reviewer’s comment. For the reasons explained in #3 point and because the bacterial burden outcome of the 2-AA effect is different from that of LPS, comparison with LPS published data was not considered in this study.  We agree that in the future, a comprehensive comparison of whole genome transcriptome studies between LPS and 2-AA may reveal important insights that may also help better understand and potentially classify the immune tolerance triggered by 2-AA.

      (10) In Figure 4, the authors study the role of MCP1 and associated pyruvate-dependent bacterial clearance during tolerization and associate them with a decrease in TNF alpha. I would suggest the addition of an ERR alpha inhibitor in these experiments. It is not clear as to why (mechanism) TNF alpha transcription was affected via pyruvate transport during bacterial exposure. I would suggest that the authors clarify the mechanism of TNF alpha activation/inactivation and its association with energy metabolism during acquired tolerance. 

      This is an excellent suggestion, given that a similar effect of ERRα on TNF-α was observed by other researchers (Chaltel-Lima et al. 2023).  Here, to clarify the mechanism of TNF alpha activation/inactivation and its association with energy metabolism, we elaborate on this aspect in the discussion section.

      Lines 388-393. The text reads:

      Previously, we reported that 2-AA tolerization induces histone deacetylation via HDAC1, reducing H3K18ac at the TNF-α promoter (Bandyopadhaya et al. 2016). The findings with acetyl-CoA reduction, the primary substrate of histone acetylation, and the TNF-α transcription  using UK5099 and ATP in 2-AA treated macrophages are in support of the bioenergetics disturbances observed in macrophages and their link to epigenetic modifications we have shown to be promoted by 2-AA (Bandyopadhaya et al. 2016)

      (11) It is surprising that authors specifically target TNF alpha as a pro-inflammatory cytokine during tolerance. Various reports of cytokines and immune modulatory factors play a vital role in immune tolerance upon bacterial exposure. I would suggest authors perform cytokine profiling or check public data sets to specify their reason for choosing TNF alpha. 

      The choice of TNF-α is based on the results obtained in our previous study  (Bandyopadhaya et al. 2016).

      Bandyopadhaya, A., M. Kesarwani, Y. A. Que, J. He, K. Padfield, R. Tompkins, and L. G. Rahme. 2012. 'The quorum sensing volatile molecule 2-amino acetophenon modulates host immune responses in a manner that promotes life with unwanted guests', PLoS pathogens, 8: e1003024.

      Bandyopadhaya, A., A. Tsurumi, D. Maura, K. L. Jeffrey, and L. G. Rahme. 2016. 'A quorum-sensing signal promotes host tolerance training through HDAC1-mediated epigenetic reprogramming', Nat Microbiol, 1: 16174.

      Bandyopadhaya, A., A. Tsurumi, and L. G. Rahme. 2017. 'NF-kappaBp50 and HDAC1 Interaction Is Implicated in the Host Tolerance to Infection Mediated by the Bacterial Quorum Sensing Signal 2-Aminoacetophenone', Front Microbiol, 8: 1211.

      Chaltel-Lima, L., F. Domínguez, L. Domínguez-Ramírez, and P. Cortes-Hernandez. 2023. 'The Role of the Estrogen-Related Receptor Alpha (ERRa) in Hypoxia and Its Implications for Cancer Metabolism', Int J Mol Sci, 24.

      Lauterbach, M. A., J. E. Hanke, M. Serefidou, M. S. J. Mangan, C. C. Kolbe, T. Hess, M. Rothe, R. Kaiser, F. Hoss, J. Gehlen, G. Engels, M. Kreutzenbeck, S. V. Schmidt, A. Christ, A. Imhof, K. Hiller, and E. Latz. 2019. 'Toll-like Receptor Signaling Rewires Macrophage Metabolism and Promotes Histone Acetylation via ATP-Citrate Lyase', Immunity, 51: 997-1011 e7.

      McCommis, K. S., and B. N. Finck. 2015. 'Mitochondrial pyruvate transport: a historical perspective and future research directions', Biochem J, 466: 443-54.

      Tzika, A. A., C. Constantinou, A. Bandyopadhaya, N. Psychogios, S. Lee, M. Mindrinos, J. A. Martyn, R. G. Tompkins, and L. G. Rahme. 2013. 'A small volatile bacterial molecule triggers mitochondrial dysfunction in murine skeletal muscle', PloS one, 8: e74528.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This manuscript is a valuable study of the responses of GPi neurons to DBS stimulation in human PD and dystonia patients and it finds evidence for altered short-term and long-term plasticity in response to DBS between the two patient populations. This data set is of interest to both basic and clinical researchers working in the field of DBS and movement disorders. While there was enthusiasm for the potential significance of these findings, support for their conclusions was incomplete. Thir data may be indicative of more interesting and complex interpretations than currently considered in the article. 

      The authors would like to express their gratitude to the Editorial Team and Reviewers for their invaluable feedback which helped to improve the manuscript.

      Reviewer #1:

      Summary:

      Sumarac et al investigate differences in globus pallidus internus (GPi) spike activity and short- and long-term plasticity of direct pathway projections in patients with Parkinson's disease (PD) and dystonia. Their main claims are that GPi neurons exhibit distinct characteristics in these two disorders, with PD associated with specific power-frequency oscillations and dystonia showing lower firing rates, increased burstiness, and less regular activity. Additionally, long-term plasticity and synaptic depression appear to differ between the two conditions. The authors suggest that these findings support the concept of hyperfunctional GPi output in PD and hypofunctional output in dystonia, possibly driven by variations in the plasticity of striato-pallidal synapses. Overall enthusiasm is relatively high, but I think the discussion omits discussing findings that don't align well with standard models. 

      Strengths: 

      These types of studies are valuable as the data arise from patients who have dystonia or PD. This could provide unique insights into disease pathophysiology that might not be recapitulated in animal systems work. 

      Thank you for the positive feedback.

      Weaknesses: 

      - The rate model and indirect/direct pathway ideas lack explanatory power; too much of the hypothesis generation and discussion in this manuscript is set in the context of these old ideas. Their data in my view emphasize this somewhat emphatically. Most patients with the 'hypokinetic' movement disorder PD have dystonia as a part of their motor features. Dystonia is a form of excessive muscle activation that on the one hand is 'hyperkinetic' but on the other usually decreases the speed of motor tasks, even in patients with primary dystonia. Similarly, PD patients display a bewildering variety of hyperkinetic manifestations as well (rest tremor, dystonia, dyskinesia). If these are truly independent classifications, i.e. hyper- versus hypo-kinetic, the authors must acknowledge that there is considerable overlap in the spike activity across groups - numerous dystonia patients display higher discharge rates than the majority of the PD sample. Based on the firing rate alone, it would not be possible to distinguish these groups. 

      Thank you for your insightful comments regarding the discussion of the rate model and the distinction between hyperkinetic and hypokinetic movement disorders. We acknowledge that the rate model, primarily derived from limited number of animal subjects [1], may not fully encapsulate the complexities of Parkinson's disease (PD) and dystonia. Our study aimed to validate animal model findings in humans by correlating single-neuron features with disease symptom severity. However, we concur with the Reviewer’s comment regarding the overlapping motor features in hypokinetic and hyperkinetic disorders. We can speculate that the overlap in neuronal properties may be reflected in the overlap of, for example, hyperkinetic features being also present in PD, as suggested by the Reviewer. Per the Reviewer’s request, we have now acknowledged this notion in the manuscript. Interestingly, hypokinetic symptoms have been reported to occur in dystonia in response to GPi-stimulation and have been associated with beta activity in the LFP [2], which reinforces the notion that neural activity may be more related to specific symptoms rather than diseases as a whole. Supplementing our analyses, in addition to total UPDRSIII scores, we have now provided correlations with only hypokinetic (i.e. bradykinesia) subscores of the UPDRSIII to focus on more direct assessment of hypokinetic features in PD versus hyperkinetic features in dystonia. We have updated our methods and results accordingly.

      [1] M. R. DeLong, “Primate models of movement disorders of basal ganglia origin.,” Trends Neurosci, vol. 13, no. 7, pp. 281–285, Jul. 1990, doi: 10.1016/0166-2236(90)90110-v.

      [2] R. Lofredi et al., “Pallidal Beta Activity Is Linked to Stimulation-Induced Slowness in Dystonia,” Movement Disorders, vol. 38, no. 5, pp. 894–899, 2023, doi: 10.1002/mds.29347.

      Amendments to the manuscript:

      “Indeed, variability in spike firing rates in PD may be reflected in the considerable overlap in spiking activity between PD and dystonia (Fig. 1A), with many dystonia patients exhibiting higher discharge rates compared to PD patients.”

      “Given that UPDRSIII includes both hypokinetic and hyperkinetic symptoms of PD, we further sought to disaggregate the score by only considering items 23-26 in UPDRSIII, which assess hypokinetic symptoms of PD.”

      “… with a marginally stronger correlation for PD hypokinetic symptoms only (items 23-26 of UPDRSIII, Spearman's rho=0.32, p=.0330; Supplementary Fig. 3)”

      Supplementary Fig. 3: We provided correlations with hypokinetic (i.e., bradykinesia) subscore of the UPDRSIII. There is very little difference between correlation results of UPDRSIII total (Fig. 1) and the hypokinetic-only subscore (Supplementary Fig. 3).

      “though our results do not change substantially when only hypokinetic PD features are considered (Supplementary Fig. 3).”

      - If beta power is pathognomonic of parkinsonism, the authors found no differences in beta-related spike discharges across the groups. One would have predicted greater beta power in PD than in primary dystonia. This should be discussed explicitly and an interpretation should be provided. 

      We agree with the reviewer that considering the previous LFP literature, one might have expected a difference in single-neuron oscillation power between PD and dystonia. However, while prior studies [3], [4] have reported significant differences in oscillatory power between the two diseases, researchers examined local field potential (LFP) activity only. Other work [5] in non-human primates investigated single-neuron oscillations and reported no differences between PD and dystonia at the single-neuron level, in line with our findings. However, despite the lack of difference in overall power presented here, we provide evidence that the strength of the beta-frequency single-neuron oscillations nevertheless correlates with symptom severity in PD but not dystonia; whereas the strength of the theta-frequency single-neuron oscillations correlates with symptom severity in dystonia but not PD.

      [3] P. Silberstein et al., “Patterning of globus pallidus local field potentials differs between Parkinson’s disease and dystonia.,” Brain, vol. 126, no. Pt 12, pp. 2597–2608, Dec. 2003, doi: 10.1093/brain/awg267.

      [4] D. D. Wang et al., “Pallidal Deep-Brain Stimulation Disrupts Pallidal Beta Oscillations and Coherence with Primary Motor Cortex in Parkinson’s Disease,” J Neurosci, vol. 38, no. 19, pp. 4556–4568, May 2018, doi: 10.1523/JNEUROSCI.0431-18.2018.

      [5] P. A. Starr et al., “Spontaneous pallidal neuronal activity in human dystonia: comparison with Parkinson’s disease and normal macaque.,” J Neurophysiol, vol. 93, no. 6, pp. 3165–3176, Jun. 2005, doi: 10.1152/jn.00971.2004.

      Amendments to the manuscript:

      “Although previous research has reported differences in the LFP power between PD and dystonia [27,28], a study in non-human primates found no such differences in single-neuron oscillatory strength [8], as reflected in our findings. However, despite a lack of difference in overall power across disorders, we were able to derive disease/frequency-specific relationships with respect to clinical scores (Fig. 1C; oscillatory features).”

      - The study lacks a healthy control group, making it challenging to differentiate disease-specific findings from normal variations in GPi activity and plasticity. Although this is acknowledged in the discussion, this complicates the interpretation of the results. The sample sizes for PD and dystonia patients are relatively small, and the study combines various forms of dystonia, potentially masking subtype-specific differences. A larger and more homogenous sample could enhance the study's reliability.

      Indeed, intraoperative microelectrode recordings cannot be obtained in healthy individuals. We agree with the Reviewer that this limits the interpretation of the data. However, directly comparing clinical correlations with single neuron readouts between two distinct clinical entities may, to some degree, compensate for the lack of healthy control data. This contrast, while not providing a healthy control, is still able to point to disease-specific differences. This approach has previously been used to comparisons at the LFP level [6]. While the sample size is indeed small, it is comparable or even higher to similar studies that have investigated the relation of symptom severity of single neuron readouts [7]. The Reviewer is right in that we do not differentiate between generalized or cervical dystonia. We chose to do so because our subgroup analysis provided in the Supplementary Material did not suggest specific differences; though there is insufficient data from specific dystonia subtypes to make formal statistical comparisons. Indeed, future studies should investigate specific subtypes further.

      [6] R. Lofredi et al., “Pallidal beta bursts in Parkinson’s disease and dystonia,” Movement Disorders, vol. 34, no. 3, pp. 420–424, 2019, doi: 10.1002/mds.27524.

      [7] A. Gulberti et al., “Subthalamic and nigral neurons are differentially modulated during parkinsonian gait,” Brain, p. awad006, Feb. 2023, doi: 10.1093/brain/awad006.

      Amendments to the manuscript:

      “While we did not observe differences across dystonia subtypes (Supplementary Fig. 1), future studies in larger patient cohorts would are warranted. Finally, as many findings in Fig. 1 do not survive corrections for multiple comparisons, we suggest interpretation of results with caution. Despite this, many of our findings related to neuronal correlates are generally in line with previous literature, especially related to oscillatory correlates of PD and dystonia.”

      - While they mention that data are available on request, sharing data openly would increase transparency and allow for independent validation of the results. It is unclear how sharing deidentified data would compromise patient privacy or present ethical issues of any kind, as claimed by the authors. 

      Much of the data in question were collected under an old Research Ethics Board (REB) protocol which did not address data sharing. However, we have consulted with our REB and gained retroactive permission to post de-identified data which are now available in the Supplementary Material.

      Amendments to the manuscript:

      “The data that support the findings of this study are available in a public repository (see: https://osf.io/nqzd2/)”

      - They appropriately acknowledge several limitations, such as the inability to use pharmacological interventions and the need for further research in the chronic setting. 

      Thank you for the comment.

      - The manuscript highlights differences in GPi activity and plasticity between PD and dystonia but could provide more context on the clinical implications of these findings, particularly regarding what the implications would be novel paradigms for deep brain stimulation. 

      Thank you for the comment. Our finding that striato-pallidal plasticity decays more slowly in dystonia compared to PD may relate to the slower time course of symptom relief associated with GPi-DBS in dystonia, as presently outlined in the discussion. On the other hand, symptoms are also suppressed for longer after the cessation of stimulation in dystonia compared to PD, which may reflect long-term plastic changes [8], [9]. In the context of clinical DBS, plasticity modulation may be facilitated by intermittent stimulation algorithms that may achieve the necessary plastic network change by applying stimulation for a defined time but could then be switched off for improved energy consumption and perhaps as a means of mitigating side effects. DBS devices with chronic sensing may enable monitoring of evoked potential amplitudes for future adaptive stimulation applications; however, currently available devices are limited by low sampling rates, but future devices may overcome these technical limitations.

      [8] D. Ruge et al., “Deep brain stimulation effects in dystonia: time course of electrophysiological changes in early treatment.,” Mov Disord, vol. 26, no. 10, pp. 1913–1921, Aug. 2011, doi: 10.1002/mds.23731.

      [9] D. Ruge et al., “Shaping reversibility? Long-term deep brain stimulation in dystonia: the relationship between effects on electrophysiology and clinical symptoms.,” Brain, vol. 134, no. Pt 7, pp. 2106–2115, Jul. 2011, doi: 10.1093/brain/awr122.

      Amendments to the manuscript:

      “While further work is certainly required to better understand disease-related differences in plasticity, our findings may nevertheless motivate the development of periodic intermittent (ON/OFF) DBS strategies which periodically modulate synaptic plasticity for therapeutic benefits which outlast stimulation delivery, as have recently been employed in preclinical work [52,53].”

      - While statistical tests are mentioned, the manuscript could benefit from a more detailed presentation of statistical methods, including correction for multiple comparisons and effect sizes. Did the authors consider different recording sites within each patient as independent observations? I think this is not appropriate if that was the case. 

      Thank you for your constructive feedback. In response to the concerns regarding the statistical methods, we have expanded our analysis to provide a more comprehensive statistical overview. Specifically, we implemented the Bonferroni correction for multiple comparisons across each of the seven tests conducted for the differences in single-neuron features between PD and dystonia. The adjustment revealed that only the burst index and coefficient of variation retain statistical significance after post hoc correction, while the firing rate does not. Results of the Bonferroni corrections are now presented in Supplementary Table 3. Reflecting on the initial comment about firing rates between the two disorders, our updated findings underscore the limitation of using firing rates alone to differentiate between PD and dystonia, and instead, our analysis now points to burstiness and firing irregularity as more reliable discriminators. Regarding the clinical correlations, we refined our statistical analysis by employing nonparametric Monte Carlo permutation tests with 5000 permutations, as used in recent work [10], [11]. This method is chosen for its independence from assumptions regarding data distribution. Specifically, we computed and tested the Spearman rho for significance using the permutation test. Then, to address multiple comparisons, we controlled the false discovery rate (FDR) using the Benjamini-Hochberg procedure. Results of these comparisons are now presented in Supplementary Table 4. Lastly, to address the concern regarding recording site independence within patients, we updated our plasticity analysis methodology. In our study, 6 out of 18 patients had multiple recording sites. Thus, to account for this, we employed linear mixed models (LMM) with patient ID as a random factor to appropriately account for the non-independence of these observations.

      [10] v Lofredi et al., “Dopamine-dependent scaling of subthalamic gamma bursts with movement velocity in patients with Parkinson’s disease,” Elife, vol. 7, p. e31895, Feb. 2018, doi: 10.7554/eLife.31895.

      [11] R. Lofredi et al., “Subthalamic beta bursts correlate with dopamine-dependent motor symptoms in 106 Parkinson’s patients,” npj Parkinsons Dis., vol. 9, no. 1, Art. no. 1, Jan. 2023, doi: 10.1038/s41531-022-00443-3.

      Amendments to the manuscript:

      “For comparing differences in single-neuron features between PD and dystonia, significant results were followed up with post hoc multiple comparisons with a Bonferroni correction. For clinical correlations, non-parametric Monte Carlo permutation tests were used, avoiding assumptions about data distribution. The tested values were randomly shuffled 5,000 times to form a probability distribution, with the p-value reflecting the original sample rank. All tests underwent adjustment for multiple comparisons, controlling the false discovery rate (FDR) at an α-level of 0.05.”

      “analyzed using a linear mixed model (LMM) with patient ID as a random factor, normalized fEP amplitudes as the response variable, and epoch as a fixed effect”

      “using a LMM with patient ID as a random factor”

      “However, none of the clinical correlations survived Benjamini-Hochberg FDR-correction for multiple comparisons (Supplementary Table 4).”

      “In PD, fEP amplitudes were significantly greater after compared to before HFS (LMM; p = .0075, effect size = 5.42 ± 1.79; Fig. 2C), while in dystonia, the increase approached but did not reach statistical significance (LMM; p = .0708, effect size = 2.82 ± 1.45; Fig. 2C).”

      All statistics were updated in the results section and the figures.

      “Finally, as many findings in Fig. 1 do not survive corrections for multiple comparisons, we suggest interpretation of results with caution. Despite this, many of our findings related to neuronal correlates are generally in line with previous literature, especially related to oscillatory correlates of PD and dystonia.”

      - The manuscript could elaborate on the potential mechanisms underlying the observed differences in GPi activity and plasticity and their relevance to the pathophysiology of PD and dystonia. 

      Thank you for your feedback. We have enhanced the manuscript by integrating additional discussions on previous studies related to plasticity in dystonia and PD (e.g., [12], [13]), which highlight excessive plasticity in dystonia. Although these may appear contradictory to our findings of increased plasticity in PD compared to dystonia, we propose (also justified by previous literature) that chronic dopaminergic medication use may lead to synaptic over-sensitization, which has been hypothesized as a biological mechanism underlying levodopa-induced dyskinesias (a hyperkinetic feature) in PD [14].

      [12] Y. Tamura et al., “Disordered plasticity in the primary somatosensory cortex in focal hand dystonia.,” Brain, vol. 132, no. Pt 3, pp. 749–755, Mar. 2009, doi: 10.1093/brain/awn348.

      [13] D. A. Peterson, T. J. Sejnowski, and H. Poizner, “Convergent evidence for abnormal striatal synaptic plasticity in dystonia.,” Neurobiol Dis, vol. 37, no. 3, pp. 558–573, Mar. 2010, doi: 10.1016/j.nbd.2009.12.003.

      [14] P. Calabresi, B. Picconi, A. Tozzi, V. Ghiglieri, and M. Di Filippo, “Direct and indirect pathways of basal ganglia: a critical reappraisal.,” Nat Neurosci, vol. 17, no. 8, pp. 1022–1030, Aug. 2014, doi: 10.1038/nn.3743.

      Amendments to the manuscript:

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that long term potentiation effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the magnitude of direct pathway plasticity [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      Reviewer #2: 

      Summary: 

      The authors investigated how neuronal activity and metrics of plasticity using local electrical stimulation in the GPi were different between Parkinson's disease and dystonia patients. 

      Strengths: 

      The introduction highlights the importance of the work and the fundamental background needed to understand the rest of the paper. It also clearly lays out the novelty (i.e., that the dynamics of plastic effects in GPi between dystonia and PD have not been directly compared). 

      The methods are clearly described and the results are well organized in the figures. 

      The results are strong with measurements from a large population of patients for each disease group and with distinct findings for each group. 

      Thank you for the kind appraisal.

      Weaknesses: 

      The discussion was hard to follow in several places, making it difficult to fully appreciate how well the authors' claims and conclusions are justified by their data, mostly in relation to the plasticity results. It may help to summarize the relevant findings for each section first and then further expand on the interpretation, comparison with prior work, and broader significance. Currently, it is hard to follow each section without knowing which results are being discussed until the very end of the section. With the current wording in the "Neuronal correlates.." section, it is not always clear which results are from the current manuscript, and where the authors are referring to past work.

      Thank you for this feedback. The main findings are now summarized in a paragraph at the beginning of the Discussion section, before being discussed in comparison to other studies in the literature in subsequent sub-sections. Moreover, throughout the Discussion, findings from our study are now always reflected by a reference to the relevant figure to more easily differentiate current findings from previous literature. Additionally, Discussion sub-sections have been expanded to consider additional literature in response to various comments throughout the Review process (including the subsequent Review comment).

      Amendments to the manuscript:

      Paper findings are referenced to figures which depict the results at hand; discussion sub-sections expanded; and the following text has been added at the start of the Discussion:

      “In particular, we found that GPi neurons exhibited lower firing rates, but greater burstiness and variability in dystonia compared to PD (Fig. 1A). While no differences were found in the power of spiketrain oscillations across disorders (Fig. 1B), we found that PD symptom severity positively correlated with the power of low-beta frequency spiketrain oscillations, whereas dystonia symptom severity positively correlated with the power of theta frequency spiketrain oscillations (Fig. 1C). Dystonia symptom severity moreover correlated negatively with firing rate, and positively with neuronal variability. These results are discussed in greater detail with respect to previous literature in the subsequent Discussion section entitled “Neuronal correlates of PD and dystonia.” In response to electrical stimulation (protocol depicted in Fig. 2A), we found significant increases in the amplitudes of positive-going stimulation-evoked field potential amplitudes (considered to reflect striato-pallidal synaptic strength; as exemplified in Fig. 2B) before versus after HFS in both PD and dystonia (Fig. 2C); with recording sites in PD exhibiting significantly greater increases (Fig. 2D). While changes to evoked potential amplitude before versus after stimulation can be considered to be reflective of long-term plasticity [15,18], the dynamics of evoked potentials during HFS (as depicted in Fig. 2E) can be considered as reflective of short-term synaptic plasticity [18,21]. To this end, our findings are suggestive of faster latency synaptic depression in PD compared to dystonia (Fig. 2F/G). Plasticity findings are discussed in greater detail in the Discussion section entitled “Direct pathway plasticity.”

      Also, I felt that more discussion could be used to highlight the significance of the current results by comparing and/or contrasting them to prior relevant work and mechanisms. The novelty or impact is not very clear as written. Could this be further substantiated in the Discussion? 

      Thank you for the feedback. The discussion has been expanded to include additional literature that is relevant to the findings reported in the manuscript. For example, with regards to the neuronal correlates sub-section, we now highlight the important findings [15] that show changes to the discharge rates and oscillatory tendencies of GPi neurons in non-human primates in response to staged MPTP applications to progressively titrate motor severity; these results substantiate our lack of correlation with firing rates in PD, and presence of a clinical correlation with beta oscillations. We additionally now emphasize human studies that found LFP power difference between PD and dystonia [3], [4]; but simultaneously highlight studies that did not find such differences in spike-train oscillations (in non-human primates) [5], which is reflective of our own findings. With regards to our plasticity sub-section, we have added new content related to previous literature on plasticity in dystonia and PD (also addressed in response to a query from Reviewer #1). For example, we bring to light a variety of previous studies [12], [13] emphasizing excessive plasticity in dystonia. However, while such studies may seem to contradict our findings of greater plasticity in PD compared to dystonia, we additionally provide hypotheses (justified by previous literature) that prolonged used of dopaminergic medication may result in synaptic over-sensitization, thus giving rise to levodopa-induced dyskinesias (a hyperkinetic feature) in PD [14].

      [3] P. Silberstein et al., “Patterning of globus pallidus local field potentials differs between Parkinson’s disease and dystonia.,” Brain, vol. 126, no. Pt 12, pp. 2597–2608, Dec. 2003, doi: 10.1093/brain/awg267.

      [4] D. D. Wang et al., “Pallidal Deep-Brain Stimulation Disrupts Pallidal Beta Oscillations and Coherence with Primary Motor Cortex in Parkinson’s Disease,” J Neurosci, vol. 38, no. 19, pp. 4556–4568, May 2018, doi: 10.1523/JNEUROSCI.0431-18.2018.

      [5] P. A. Starr et al., “Spontaneous pallidal neuronal activity in human dystonia: comparison with Parkinson’s disease and normal macaque.,” J Neurophysiol, vol. 93, no. 6, pp. 3165–3176, Jun. 2005, doi: 10.1152/jn.00971.2004.

      [12] Y. Tamura et al., “Disordered plasticity in the primary somatosensory cortex in focal hand dystonia.,” Brain, vol. 132, no. Pt 3, pp. 749–755, Mar. 2009, doi: 10.1093/brain/awn348.

      [13] D. A. Peterson, T. J. Sejnowski, and H. Poizner, “Convergent evidence for abnormal striatal synaptic plasticity in dystonia.,” Neurobiol Dis, vol. 37, no. 3, pp. 558–573, Mar. 2010, doi: 10.1016/j.nbd.2009.12.003.

      [14] P. Calabresi, B. Picconi, A. Tozzi, V. Ghiglieri, and M. Di Filippo, “Direct and indirect pathways of basal ganglia: a critical reappraisal.,” Nat Neurosci, vol. 17, no. 8, pp. 1022–1030, Aug. 2014, doi: 10.1038/nn.3743.

      [15] A. Muralidharan et al., “Physiological changes in the pallidum in a progressive model of Parkinson’s disease: Are oscillations enough?,” Exp Neurol, vol. 279, pp. 187–196, May 2016, doi: 10.1016/j.expneurol.2016.03.002.

      Amendments to the manuscript:

      “Despite the lack of correlations with firing rate in PD, our findings seem to align with those of Muralidharan and colleagues [25], who showed that GPi neuronal firing rates may not directly correlate with motor severity but exhibit variability across the disease severity continuum in parkinsonian non-human primates (initially increasing, then decreasing, then increasing again at mild, moderate, and severe disease manifestations, respectively). Thus, while GPi discharge rates may change in PD, such changes may not be reflected by linear relationships with motor sign development and progression. Indeed, variability in spike firing rates in PD may be reflected in the considerable overlap in spiking activity between PD and dystonia (Fig. 1A), with many dystonia patients exhibiting higher discharge rates compared to PD patients. While differences in discharge rates were nevertheless observed between PD and dystonia, it may be that the combination of rate and pattern (reflected in the BI and CV) changes best differentiates the two disorders.”

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation (LTP) at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that LTP effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the amount of plasticity elicited in GPi [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      Some specific comments and questions about the Discussion: 

      Lines 209-211 - This sentence was hard to understand, could it be clarified? 

      Lines 211-213 - What do phasic and tonic components mean exactly? Could this be specifically defined? Are there specific timescales (as referred to in Intro)?

      Lines 215-217 - It's not clear what was delayed in dystonia, and how the authors are trying to contrast this with the faster time course in PD. I think some of this is explained in the introduction, but could also be re-summarized here as relevant to the results discussed. 

      Lines 223-224 - I'm not sure I follow the implication that network reorganization leads to delayed functional benefits. Could this be further elaborated? 

      Reply & Amendments to the manuscript: Thank you for your feedback. We've made the following concise revisions to address the comments:

      We've clarified lines 209-211 to explain that variations in electrical stimulation effects on pathways in PD and dystonia may reveal the operational mechanisms of DBS, despite a common target:

      “The variation in the modulation of these projections / pathways to electrical stimulation may also indicate the mechanism by which DBS operates across PD and dystonia, despite a common stimulation target.”

      In response to the second comment on lines 211-213 about phasic and tonic components, we now specify that phasic refers to dynamic muscle contractions, and tonic to continuous muscle contractions, providing clear definitions relevant to our context:

      “Clinical studies in dystonia have shown that DBS leads to a more rapid improvement in the transient, dynamic muscle contractions (phasic components) of the disorder when compared to the sustained, continuous muscle contractions (tonic or fixed components) [33]”

      For lines 215-217, we've refined our discussion to clearly contrast the delayed response in dystonia with the faster onset in PD:

      “This contrast with PD, where the, the maximal clinical response to DBS occurs within a much faster time course [13,36].”

      On lines 223-224, we've expanded the explanation of how network reorganization may lead to delayed functional benefits, highlighting adjustments in neural connectivity and synaptic efficacy in response to stimulation:

      “which involves adjustments in neural connectivity or synaptic efficacy in response to the stimulation [14,35].”

      Could the absence of a relationship between FR and disease in PD be discussed? 

      Thank you for raising this point. Despite observing higher firing rates in PD compared to dystonia, it is unexpected that these rates do not correlate with symptom severity according to the rate model of PD [1]. However, despite the lack of correlations with firing rates, our findings align with similar animal work of Muralidharan et al. [15], which reported that neuronal firing rates within the GPi of rhesus monkeys did not increase linearly with respect to varying intensities of parkinsonian motor severity. We did however show that low beta oscillatory strength within the GPi may play a significant role in the manifestation of motor symptoms in PD; which is also in line with findings of Muralidharan and colleagues. As per the Reviewer’s request, we have included this content into our discussion.

      [1] M. R. DeLong, “Primate models of movement disorders of basal ganglia origin.,” Trends Neurosci, vol. 13, no. 7, pp. 281–285, Jul. 1990, doi: 10.1016/0166-2236(90)90110-v.

      [15] A. Muralidharan et al., “Physiological changes in the pallidum in a progressive model of Parkinson’s disease: Are oscillations enough?,” Exp Neurol, vol. 279, pp. 187–196, May 2016, doi: 10.1016/j.expneurol.2016.03.002.

      Amendments to the manuscript:

      “Despite the lack of correlations with firing rate in PD, our findings seem to align with those of Muralidharan and colleagues [25], who showed that GPi neuronal firing rates may not directly correlate with motor severity but exhibit variability across the disease severity continuum in parkinsonian non-human primates (initially increasing, then decreasing, then increasing again at mild, moderate, and severe disease manifestations, respectively). Thus, while GPi discharge rates may change in PD, such changes may not be reflected by linear relationships with motor sign development and progression.”

      “Indeed, Muralidharan and colleagues [25] also showed linear group-level relationships between low-beta frequency spiketrain oscillations and disease severity in parkinsonian non-human primates, despite the lack of linear relationships with spike discharge rates (as discussed above).”

      It wasn't very clear how the direct pathway can be attributed to plasticity changes if the GPi makes up both the direct and indirect pathways. Could this be further clarified? 

      The reviewer brings up an important nuanced point. Recent work from our lab [16] shows that inhibitory evoked fields in STN (which receives inhibitory fields from GPe; no other inhibitory sources) are persistent with very minimal depression during HFS. On the other hand, inhibitory fields in the SNr (which receives majority of its inhibitory inputs from striatum; though some come by way of GPe as well per anatomical literature) depress quickly. We have previously also shown these rapidly depressing fields in GPi [17], [18], which also receives the majority of its inhibitory inputs via striatum, though some also from GPe. As such, the disaggregation of striatum-mediated versus GPe-mediated inhibitory fields is achieved based on: lack of rapidly depressing inhibitory evoked field potentials in STN (which receives inhibitory inputs via GPe and not striatum), but a common presence of rapidly depressing evoked field potentials in SNr and GPi (which both receive most of their inhibitory inputs from striatum); differences in the morphology of purportedly GPe- (fast latency) versus striatum-mediated (slow latency) evoked field potentials [16]; and the presence of slow latency caudato-nigral evoked field potentials in slices [19] that are reversed by GABA antagonist application [20]. These points are indeed outlined in the first paragraph of the Discussion sub-section “Direct pathway plasticity.” However, we have now additionally added a point to the Limitations that inhibitory inputs to the GPi also come by way of GPe, though in a lesser abundance.

      [16] L. A. Steiner et al., “Persistent synaptic inhibition of the subthalamic nucleus by high frequency stimulation,” Brain Stimul, vol. 15, no. 5, pp. 1223–1232, 2022, doi: 10.1016/j.brs.2022.08.020.

      [17] L. D. Liu, I. A. Prescott, J. O. Dostrovsky, M. Hodaie, A. M. Lozano, and W. D. Hutchison, “Frequency-dependent effects of electrical stimulation in the globus pallidus of dystonia patients.,” J Neurophysiol, vol. 108, no. 1, pp. 5–17, Jul. 2012, doi: 10.1152/jn.00527.2011.

      [18] L. Milosevic et al., “Modulation of inhibitory plasticity in basal ganglia output nuclei of patients with Parkinson’s disease,” Neurobiology of Disease, vol. 124, pp. 46–56, Apr. 2019, doi: 10.1016/j.nbd.2018.10.020.

      [19] M. Yoshida and W. Precht, “Monosynaptic inhibition of neurons of the substantia nigra by caudato-nigral fibers,” Brain Res, vol. 32, no. 1, pp. 225–228, Sep. 1971, doi: 10.1016/0006-8993(71)90170-3.

      [20] W. Precht and M. Yoshida, “Blockage of caudate-evoked inhibition of neurons in the substantia nigra by picrotoxin,” Brain Res, vol. 32, no. 1, pp. 229–233, Sep. 1971, doi: 10.1016/0006-8993(71)90171-5.

      Amendments to the manuscript:

      “Indeed, GPi receives the greatest abundance of inhibitory inputs from striatum (direct pathway), but also it also receives inhibitory inputs by way of GPe (indirect pathway). Although we can functionally disaggregate these pathway-specific responses based on differences in morphology and dynamics of GPe-mediated versus striatum-mediated inhibitory fEPs [21]; the possibility of compounded effects cannot be completely ruled out.”

      The mechanism of short- and long-term plasticity as applied in the protocols used in this work are outlined in reference to previous citations [15, 16, 18]. Because this is a central aspect of the current work and interpreting the results, it was difficult to appreciate how these protocols provide distinct metrics of short and long-term plasticity in GPi without some explanation of how it applies to the current work and the specific mechanisms. It would also help to be able to better link how the results fit with the broader conclusions. 

      Short-term plasticity is measured as the dynamic change to the fEP during ongoing HFS. For long-term plasticity analyses, the fEP amplitudes during LFS were compared pre- versus post-HFS. To make this analysis more intuitive we have added a protocol illustration to Fig 2. We have moreover greatly expanded the discussion to include more literature related to disease-specific differences in plasticity, and implications of modulating plasticity using DBS.

      Amendments to the manuscript:

      Added new panel to Fig 2

      Author response image 1.

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that long term potentiation effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the amount of plasticity elicited in GPi [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      In the Conclusion, it was difficult to understand the sentence about microcircuit interaction (line 232) and how it selectively modulates the efficacy of target synapses. Some further explanation here would be helpful. Also, it was not clear how these investigations (line 237) provide cellular-level support for closed-loop targeting. Could the reference to closed-loop targeting also be further explained? 

      We agree with the reviewer that the current wording may be confusing. We have changed the wording to be clearer. We have additionally added content related to closed-loop DBS based on chronic monitoring of evoked potential responses.

      Amendments to the manuscript:

      “Furthermore, chronic monitoring of evoked fields may allow for tracking of subcortical neuronal projections as indexed by inhibitory fields reported in this study. microcircuit interaction to selectively modulate the efficacy of target synapses.”

      future applications of DBS may also benefit from closed loop tuning of basal-ganglia-thalamo-cortical circuit dynamics and plasticity through chronic monitoring of evoked potential responses [56].

      How is the burst index calculated (Methods)? 

      Thank you for pointing out that the burst index definition was missing from the paper. It has now been added to the manuscript.

      Amendments to the manuscript:

      “The burst index was computed by taking the ratio of the means from a two-component Gaussian mixture model applied to the log interspike interval distribution, a modification of the previous mode-over-mean ISI method [20]”

      Figures and figure captions are missing some details:

      Fig. 1 - What does shading represent? 

      The shading in Fig. 1 illustrates results that were significant before adjustment for multiple comparisons.

      Amendments to the manuscript:

      “Depicted scatterplots are results that were significant before correction for multiple comparisons”

      Fig. 2 - Can the stimulation artifact be labeled so as not to be confused with the physiological signal? Is A representing the average of all patients or just one example? Are there confidence intervals for this data as it's not clear if the curves are significantly different or not (may not be important to show if just one example)? Same for D. What is being plotted in E? Is this the exponential fitted on data? Can this be stated in the figure citation directly so readers don't have to find it in the text, where it may not be directly obvious which figure the analyses are being applied towards? 

      Thank you for your comments regarding Fig. 2. We have made the following revisions to address the concerns:

      To clarify the presence of stimulation artifacts and differentiate them from the physiological signal, we have updated Panel B and E in the updated Fig. 2 which highlight the stimulation artifacts accordingly.

      Regarding the comment about Panel A (now B in the updated figure), it represents one single example per disease, rather than an average of all patients.

      In response to the comment about what is plotted in Panel E, we have revised the figure caption to explicitly state that it includes the exponential fit on the data.

      Amendments to the manuscript:

      Figure 2 panel B and E now highlight stimulation artifacts.

      Author response image 2.

      Author response image 3.

      The figure captions could use more details, that can be taken from the text, so that readers can understand figures without searching for relevant details across the paper. 

      Thank you for your feedback. We have revised the figure captions accordingly to provide more details.

      Amendments to the manuscript:

      “Fig 1 – GPi spiketrain feature analyses and clinical correlates of PD and dystonia. (A) With respect to (A) rate-based spiketrain features, firing rate was greater in PD while burst index (BI) and coefficient of variation (CV) were greater in dystonia; whereas no differences were found for (B) oscillatory spiketrain features for theta, alpha, low beta, high beta frequencies. MWU statistical results depicted are not corrected for multiple comparisons; after correction using the Bonferroni method, only CV and BI results remain significant (please see Supplementary Table 3). (C) In PD, the power of low beta spiketrain oscillations positively correlated (Spearman correlation) with symptom severity; in dystonia, neuronal firing rate negatively correlated with symptom severity, whereas CV and the power of theta spiketrain oscillations positively correlated with symptom severity. Depicted scatterplots are results that were significant before correction for multiple comparisons; however, none of the results persist after Benjamini-Hochberg correction for false discovery rate (please see Supplementary Table 4).”

      “Fig 2 – Long-term and short-term effects of HFS on striato-pallidal plasticity in PD and dystonia. (A) Schematic of the plasticity protocol to assess long-term plasticity via fEP amplitude comparisons pre- versus post-HFS and short-term plasticity via fEP dynamics during HFS. (B) Highlights example fEP traces for measuring long-term plasticity pre- versus post-HFS, with (C) displaying group-level fEP amplitudes pre- versus post-HFS across diseases. (D) Illustrates the amount of plasticity (i.e., percentage change in fEP amplitudes pre- versus post-HFS) in both PD and dystonia, with PD showing higher levels of plasticity. (E) Provides an example of fEP traces during HFS for assessing short-term plasticity, with (F) depicting group-level decay rates of fEP amplitudes using an exponential fit on the fEP amplitudes over the first 5 stimulus pulses across diseases. (G) Shows the half-life of the fitted exponential (i.e., rate of attenuation of fEP amplitudes) between PD and dystonia, with PD demonstrating faster fEP attenuation.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study by Paoli et al. used a resonant scanning multiphoton microscope to examine olfactory representation in the projection neurons (PNs) of the honeybee with improved temporal resolution. PNs were classified into 9 groups based on their response patterns. Authors found that excitatory repose in the PNs precedes the inhibitory responses for ~40ms, and ~50% of PN responses contain inhibitory components. They built the neural circuit model of the mushroom body (MB) with evolutionally conserved features such as sparse representation, global inhibition, and a plasticity rule. This MB model fed with the experimental data could reproduce a number of phenomena observed in experiments using bees and other insects, including dynamical representations of odor onset and offset by different populations of Kenyon cells, prolonged representations of after-smell, different levels of odorspecificity for early/delay conditioning, and shift of behavioral timing in delay conditioning. The trace conditioning was not modeled and tested experimentally. Also, the experimental result itself is largely confirmatory to preceding studies using other organisms. Nonetheless, the experimental data and the model provide a solid basis for future studies.  

      We thank the reviewer for summarizing the value of our study and recognizing its generality and significance. As suggested, in a revised version of the manuscript, we will discuss the implication of our approach for the context of trace conditioning. The model we presented hinges on the learning-induced plasticity of KC-to-MBON synapses recruited during the learning window (i.e., the simulated US arrival). In the case of trace conditioning, the model predicts that the time of the behavioral response time should match the expected US arrival. Contrary to this prediction, preliminary analyses on empirical measurements of PER latency upon trace conditioning indicate this is not the case. In a revised version of the manuscript, we will discuss the differences between the predictions of the model and the experimental observations in a trace conditioning paradigm.

      Reviewer #2 (Public Review):

      The study presented by Paoli et al. explores temporal aspects of neuronal encoding of odors and their perception, using bees as a general model for insects. The neuronal encoding of the presence of an odor is not a static representation; rather, its neuronal representation is partly encoded by the temporal order in which parallel olfactory pathways participate and are combined. This aspect is not novel, and its relevance in odor encoding and recognition has been discussed for more than the past 20 years. 

      The temporal richness of the olfactory code and its significance have traditionally been driven by results obtained based on electrophysiological methods with temporal resolution, allowing the identification and timing of the action potentials in the different populations of neurons whose combination encodes the identity of an odor. On the other hand, optophysiological methods that enable spatial resolution and cell identification in odor coding lack the temporal resolution to appreciate the intricacies of olfactory code dynamics. 

      (1) In this context, the main merit of Paoli et al.'s work is achieving an optical recording that allows for spatial registration of olfactory codes with greater temporal detail than the classical method and, at the same time, with greater sensitivity to measure inhibitions as part of the olfactory code. 

      The work clearly demonstrates how the onset and offset of odor stimulation triggers a dynamic code at the level of the first interneurons of the olfactory system that changes at every moment as a natural consequence of the local inhibitory interactions within the first olfactory neuropil, the antennal lobe. This gives rise to the interesting theory that each combination of activated neurons along this temporal sequence corresponds to the perception of a different odor. The extent to which the corresponding postsynaptic layers integrate this temporal information to drive the perception of an odor, or whether this sequence is, in a sense, a journey through different perceptions, is challenging to address experimentally. 

      In their work, the authors propose a computational approach and olfactory learning experiments in bees to address these questions and evaluate whether the sequence of combinations drives a sequence of different perceptions. In my view, it is a highly inspiring piece of work that still leaves several questions unanswered. 

      We thank the reviewer for considering that our work has an inspiring nature. Below we have tried to answer the questions raised by the following comments, and we will include part of these answers in the revised version of our manuscript.

      (2) In my opinion, the detailed temporal profile of the response of projection neurons and their respective probabilities of occurrence provide valuable information for understanding odor coding at the level of neurons transferring information from the antennal lobes to the mushroom bodies. An analysis of these probabilities in each animal, rather than in the population of animals that were measured, would aid in better comprehending the encoding function of such temporal profiles. Being able to identify the involved glomeruli and understanding the extent to which the sequence of patterns and inhibitions is conserved for each odor across different animals, as it is well known for the initial excitatory burst of activity observed in previous studies without the fine temporal detail, would also be highly significant. 

      We thank the reviewer for recognizing the relevance of the findings in understanding the logic of olfactory coding. We agree about the importance of establishing if the different glomerular response profiles are evenly distributed across individuals or have individual biases. In the revised version of the manuscript, we will provide data on the distribution of response profiles for each animal and for different olfactory stimuli. Also, we fully agree on the importance of assessing to what extent such response profiles - largely determined by the local network of AL interneurons - are glomerulus-specific and conserved across individuals.

      In my view, the computational approach serves as a useful tool to inspire future experiments; however, it appears somewhat simplistic in tackling the complexity of the subject. One question that I believe the researchers do not address is to what extent the inhibitions recorded in the projection neurons are integrated by the Kenyon cells and are functional for generating odor-specific patterns at that level. 

      The model we proposed represents, indeed, a simplification of olfactory signal processing throughout the honey bee olfactory circuit. Still, it shows that simple but realistic rules can be sufficient to grasp some fundamental aspects of olfactory coding. However, we agree with the reviewer and believe that such a minimalistic model can provide a basis for designing future experiments in which complexity can be increased by adding relevant features, such as the learning-induced plasticity of PN-to-KC synapses or the divergence of multiple PNs from the same glomerulus to different KCs.

      Concerning the reviewer's question on the involvement of inhibitory inputs in generating odor-specific patterns at the level of the KCs, the short answer is yes, they contribute to the summed input of a target KC, thus to the odor representation. In designing the model, we considered that a given glomerulus provides maximal input at maximal excitation and minimal input (=0 input) at maximal inhibition. For this reason, an inhibited glomerulus contributes less (to KC action potential probability) than a glomerulus showing baseline activity. This, in turn, contributes less than an excited glomerulus. From the modeling point of view, normalizing the signal between 0 and 1 (i.e., setting minimal inhibition to 0 and maximal excitation to 1) would yield a similar result as with the current approach, where values range from -25% to +30% F/F. We implement the model's description to clarify this point.

      Lastly, the behavioral result indicating a difference in conditioned response latency after early or delayed learning protocol is interesting. However, it does not align with the expected time for the neuronal representation that was theoretically rewarded in the delayed protocol. This final result does not support the authors' interpretation regarding the existence of a smell and an after-smell as separate percepts that can serve as conditioned stimuli.

      Considering that our odor stimulus lasted 5 seconds, glomerular activity is highly variable at odor onset (i.e., within the first 1s) because of short excitatory response profiles and the delayed and slower onset of inhibitory responses. After the initial phase, the neural representation of the stimulus becomes more stable. Consequently, a neural signature learned in the case of delay conditioning, i.e., with the US appearing towards the end of the olfactory stimulation (t = 4 - 5s), may present itself much earlier (t = 1.5s), triggering a behavioral response that largely anticipates the expected US arrival time. 

      In the model, we observe an early decrease in action potential probability even in the case of delay conditioning. This occurs because the synapses recruited during the last second of olfactory stimulation (within the learning window during which CS and US overlap) become inactive. Because odorant-induced activity recruits highly overlapping synaptic populations between 1.5 and 5 s from the onset, a learning-induced inactivation of part of these synapses will result in a reduced action-potential probability in the modeled MBON. Importantly, this event will not be governed by time but by the appearance of the learned synaptic configuration. 

      We will add a new section to the revised version of the manuscript to clarify this concept and perform further analyses to characterize the contribution of different response types to the modeled response latency.

    1. Author response:

      The following is the response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Nitta et al, in their manuscript titled, "Drosophila model to clarify the pathological significance of OPA1 in autosomal dominant optic atrophy." The novelty of this paper lies in its use of human (hOPA1) to try to rescue the phenotype of an OPA1 +/- Drosophilia DOA model (dOPA). The authors then use this model to investigate the differences between dominant-negative and haploinsufficient OPA1 variants. The value of this paper lies in the study of DN/HI variants rather than the establishment of the drosophila model per se as this has existed for some time and does have some significant disadvantages compared to existing models, particularly in the extra-ocular phenotype which is common with some OPA1 variants but not in humans. I judge the findings of this paper to be valuable with regards to significance and solid with regards to the strength of the evidence.

      Suggestions for improvements:

      (1) Stylistically the results section appears to have significant discussion/conclusion/inferences in section with reference to existing literature. I feel that this information would be better placed in the separate discussion section. E.g. lines 149-154.

      We appreciate the reviewer’s suggestion to relocate the discussion, conclusions, and inferences, particularly those that reference existing literature, to a separate discussion section. For lines 149–154, we placed them in the discussion section (lines 343–347) as follows. “Our established fly model is the first simple organism to allow observation of degeneration of the retinal axons. The mitochondria in the axons showed fragmentation of mitochondria. Former studies have observed mitochondrial fragmentation in S2 cells (McQuibban et al., 2006), muscle tissue (Deng et al., 2008), segmental nerves (Trevisan et al., 2018), and ommatidia (Yarosh et al., 2008) due to the LOF of dOPA1.”

      For lines 178–181, we also placed them in the discussion section (lines 347–351) as follows. “Our study presents compelling evidence that dOPA1 knockdown instigates neuronal degeneration, characterized by a sequential deterioration at the axonal terminals and extending to the cell bodies. This degenerative pattern, commencing from the distal axons and progressing proximally towards the cell soma, aligns with the paradigm of 'dying-back' neuropathy, a phenomenon extensively documented in various neurodegenerative disorders (Wang et al., 2012). ”

      For lines 213–217, 218–220, and 222–223, we also placed them in the discussion section (lines 363– 391) as follows. “To elucidate the pathophysiological implications of mutations in the OPA1 gene, we engineered and expressed several human OPA1 variants, including the 2708-2711del mutation, associated with DOA, and the I382M mutation, located in the GTPase domain and linked to DOA. We also investigated the D438V and R445H mutations in the GTPase domain and correlated with the more severe DOA plus phenotype. The 2708-2711del mutation exhibited limited detectability via HA-tag probing. Still, it was undetectable with a myc tag, likely due to a frameshift event leading to the mutation's characteristic truncated protein product, as delineated in prior studies (Zanna et al., 2008). Contrastingly, the I382M, D438V, and R445H mutations demonstrated expression levels comparable to the WT hOPA1. However, the expression of these mutants in retinal axons did not restore the dOPA1 deficiency to the same extent as the WT hOPA1, as evidenced in Figure 5E. This finding indicates a functional impairment imparted by these mutations, aligning with established understanding (Zanna et al., 2008). Notably, while the 2708-2711del and I382M mutations exhibited limited functional rescue, the D438V and R445H mutations did not show significant rescue activity. This differential rescue efficiency suggests that the former mutations, particularly the I382M, categorized as a hypomorph (Del Dotto et al., 2018), may retain partial functional capacity, indicative of a LOF effect but with residual activity. The I382M missense mutation within the GTPase domain of OPA1 has been described as a mild hypomorph or a disease modifier. Intriguingly, this mutation alone does not induce significant clinical outcomes, as evidenced by multiple studies (Schaaf et al., 2011; Bonneau et al., 2014; Bonifert et al., 2014; Carelli et al., 2015). A significant reduction in protein levels has been observed in fibroblasts originating from patients harboring the I382M mutation. However, mitochondrial volume remains unaffected, and the fusion activity of mitochondria is only minimally influenced (Kane et al., 2017; Del Dotto et al., 2018). This observation is consistent with findings reported by de la Barca et al. in Human Molecular Genetics 2020, where a targeted metabolomics approach classified I382M as a mild hypomorph. In our current study, the I382M mutation preserves more OPA1 function compared to DN mutations, as depicted in Figures 5E and F. Considering the results from our Drosophila model and previous research, we hypothesize that the I382M mutation may constitute a mild hypomorphic variant. This might explain its failure to manifest a phenotype on its own, yet its contribution to increased severity when it occurs in compound heterozygosity.

      (2) I do think further investigation as to why a reduction of mitochondria was noticed in the knockdown. There are conflicting reports on this in the literature. My own experience of this is fairly uniform mitochondrial number in WT vs OPA1 variant lines but with an increased level of mitophagy presumably reflecting a greater turnover. There are a number of ways to quantify mitochondrial load e.g. mtDNA quantification, protein quantification for tom20/hsp60 or equivalent. I feel the reliance on ICC here is not enough to draw conclusions. Furthermore, mitophagy markers could be checked at the same time either at the transcript or protein level. I feel this is important as it helps validate the drosophila model as we already have a lot of experimental data about the number and function of mitochondria in OPA+/- human/mammalian cells.

      We thank the reviewer for the insightful comments and suggestions regarding our study on the impact of mitochondrial reduction in a knockdown model. We concur with the reviewer’s observation that our initial results did not definitively demonstrate a decrease in the number of mitochondria in retinal axons. Furthermore, we measured mitochondrial quantity by conducting western blotting using antiCOXII and found no reduction in mitochondrial content with the knockdown of dOPA1 (Figure S4A and B). Consequently, we have revised our manuscript to remove the statement “suggesting a decreased number of mitochondria in retinal axons. However, whether this decrease is due to degradation resulting from a decline in mitochondrial quality or axonal transport failure remains unclear.” Instead, we have refocused our conclusion to reflect our electron microscopy findings, which indicate reduced mitochondrial size and structural abnormalities. The reviewer’s observation of consistent mitochondrial numbers in WT versus mutant variant lines and elevated mitophagy levels prompted us to evaluate mitochondrial turnover as a significant factor in our study. Regarding verifying mitophagy markers, we incorporated the mito-QC marker in our experimental design. In our experiments, mito-QC was expressed in the retinal axons of Drosophila to assess mitophagy activity upon dOPA1 knockdown. We observed a notable increase in mCherry positive but GFP negative puncta signals one week after eclosion, indicating the activation of mitophagy (Figure 2D–H). This outcome strongly suggests that dOPA1 knockdown enhances mitophagy in our Drosophila model. The application of mito-QC as a quantitative marker for mitophagy, validated in previous studies, offers a robust approach to analyzing this process. Our findings elucidate the role of dOPA1 in mitochondrial dynamics and its implications for neuronal health. These results have been incorporated into Figure 2, with the corresponding text updated as follows (lines 159–167): “Given that an increase in mitophagy activity has been reported in mouse RGCs and nematode ADOA models (Zaninello et al., 2022; Zaninello et al., 2020), the mitoQC marker, an established indicator of mitophagy activity, was expressed in the photoreceptors of Drosophila. The mito-QC reporter consists of a tandem mCherry-GFP tag that localizes to the outer membrane of mitochondria (Lee et al., 2018). This construct allows the measurement of mitophagy by detecting an increase in the red-only mCherry signal when the GFP is degraded after mitochondria are transported to lysosomes. Post dOPA1 knockdown, we observed a significant elevation in mCherry positive and GFP negative puncta signals at one week, demonstrating an activation of mitophagy as a consequence of dOPA1 knockdown (Figure 2D–H).”  

      (3) Could the authors comment on the failure of the dOPA1 rescue to return their biomarker, axonal number to control levels. In Figure 4D is there significance between the control and rescue. Presumably so as there is between the mutant and rescue and the difference looks less.

      As the reviewer correctly pointed out, there is a significant difference between the control and rescue groups, which we have now included in the figure. Additionally, we have incorporated the following comments in the discussion section (lines 329–342) regarding this significant difference: “In our study, expressing dOPA1 in the retinal axons of dOPA1 mutants resulted in significant rescue, but it did not return to control levels. There are three possible explanations for this result. The first concerns gene expression levels. The Gal4-line used for the rescue experiments may not replicate the expression levels or timing of endogenous dOPA1. Considering that the optimal functionality of dOPA1 may be contingent upon specific gene expression levels, attaining a wild-type-like state necessitates the precise regulation of these expression levels. The second is a nonautonomous issue. Although dOPA1 gene expression was induced in the retinal axons for the rescue experiments, many retinal axons were homozygous mutants, while other cell types were heterozygous for the dOPA1 mutation. If there is a non-autonomous effect of dOPA1 in cells other than retinal axons, it might not be possible to restore the wild-type-like state fully. The third potential issue is that only one isoform of dOPA1 was expressed. In mouse OPA1, to completely restore mitochondrial network shape, an appropriate balance of at least two different isoforms, lOPA1 and s-OPA1, is required (Del Dotto et al., 2017). This requirement implies that multiple isoforms of dOPA1 are essential for the dynamic activities of mitochondria.”

      (4) The authors have chosen an interesting if complicated missense variant to study, namely the I382M with several studies showing this is insufficient to cause disease in isolation and appears in high frequency on gnomAD but appears to worsen the phenotype when it appears as a compound het. I think this is worth discussing in the context of the results, particularly with regard to the ability for this variant to partially rescue the dOPA1 model as shown in Figure 5.

      As the reviewer pointed out, the I382M mutation is known to act as a disease modifier. However, in our system, as suggested by Figure 5, I382M appears to retain more activity than DN mutations. Considering previous studies, we propose that I382M represents a mild hypomorph. Consequently, while I382M alone may not exhibit a phenotype, it could exacerbate severity in a compound heterozygous state. We have incorporated this perspective in our revised discussion (lines 375-391).

      “Notably, while the 2708-2711del and I382M mutations exhibited limited functional rescue, the D438V and R445H mutations did not show significant rescue activity. This differential rescue efficiency suggests that the former mutations, particularly the I382M, categorized as a hypomorph (Del Dotto et al., 2018), may retain partial functional capacity, indicative of a LOF effect but with residual activity. The I382M missense mutation within the GTPase domain of OPA1 has been described as a mild hypomorph or a disease modifier. Intriguingly, this mutation alone does no induce significant clinical outcomes, as evidenced by multiple studies (Schaaf et al., 2011; Bonneau et al., 2014; Bonifert et al., 2014; Carelli et al., 2015). A significant reduction in protein levels has been observed in fibroblasts originating from patients harboring the I382M mutation. However, mitochondrial volume remains unaffected, and the fusion activity of mitochondria is only minimally influenced (Kane et al., 2017; Del Dotto et al., 2018). This observation is consistent with findings reported by de la Barca et al. in Human Molecular Genetics 2020, where a targeted metabolomics approach classified I382M as a mild hypomorph. In our current study, the I382M mutation preserves more OPA1 function compared to DN mutations, as depicted in Figures 5E and F. Considering the results from our Drosophila model and previous research, we hypothesize that the I382M mutation may constitute a mild hypomorphic variant. This might explain its failure to manifest a phenotype on its own, yet its contribution to increased severity when it occurs in compound heterozygosity.”

      (5) I feel the main limitation of this paper is the reliance on axonal number as a biomarker for OPA1 function and ultimately rescue. I have concerns because a) this is not a well validated biomarker within the context of OPA1 variants b) we have little understanding of how this is affected by over/under expression and c) if it is a threshold effect e.g. once OPA1 levels reach <x% pathology develops but develops normally when opa1 expression is >x%. I think this is particularly relevant when the authors are using this model to make conclusions on dominant negativity/HI with the authors proposing that if expression of a hOPA1 transcript does not increase opa1 expression in a dOPA1 KO then this means that the variant is DN. The authors have used other biomarkers in parts of this manuscript e.g. ROS measurement and mito trafficking but I feel this would benefit from something else particularly in the latter experiments demonstrated in figure 5 and 6.

      The reviewer raised concerns regarding the adequacy of axonal count as a validated biomarker in the context of OPA1 mutants. In response, we corroborated its validity using markers such as MitoSOX, Atg8, and COXII. Experiments employing MitoSOX revealed that the augmented ROS signals resulting from dOPA1 knockdown were mitigated by expressing human OPA1. Conversely, the mutant variants 2708-2711del, D438V, and R445H did not ameliorate these effects, paralleling the phenotype of axonal degeneration observed. These findings are documented in Figure 5F, and we have incorporated the following text into section lines 248–254 of the results:

      “Furthermore, we assessed the potential for rescuing ROS signals. Similar to its effect on axonal degeneration, wild-type hOPA1 effectively mitigated the phenotype, whereas the 2708-2711del, D438V, and R445H mutants did not (Figure 5F). Importantly, the I382M variant also reduced ROS levels comparably to the wild type. These findings demonstrate that both axonal degeneration and the increase in ROS caused by dOPA1 downregulation can be effectively counteracted by hOPA1. Although I382M retains partial functionality, it acts as a relatively weak hypomorph in this experimental setup.”

      Moreover, utilizing mito-QC, we observed elevated mitophagy in our Drosophila model, with these results now included in Figure 2D–H. Given the complexity of the genetics involved and the challenges in establishing lines, autophagy activity was quantified by comparing the ratio of Atg8-1 to Atg8-2 via Western blot analysis. However, no significant alterations were detected across any of the genotypes. Additionally, mitochondrial protein levels derived from COXII confirmed consistent mitochondrial quantities, showing no considerable variance following knockdown. These insights affirm that retinal axon degeneration and mitophagy activation are present in the Drosophila DOA model, although the Western blot analysis revealed no significant changes in autophagy activation. Such findings necessitate caution as this model may not fully replicate the molecular pathology of the corresponding human disease. These Western blot findings are presented in Figure S4, with the following addition made to section lines 255–263 of the results:

      “We also conducted Western blot analyses using anti-COXII and anti-Atg8a antibodies to assess changes in mitochondrial quantity and autophagy activity following the knockdown of dOPA1. Mitochondrial protein levels, indicated by COXII quantification, were evaluated to verify mitochondrial content, and the ratio of Atg8a-1 to Atg8a-2 was used to measure autophagy activation. For these experiments, Tub-Gal4 was employed to systemically knockdown dOPA1. Considering the lethality of a whole-body dOPA1 knockdown, Tub-Gal80TS was utilized to repress the knockdown until eclosion by maintaining the flies at 20°C. After eclosion, we increased the temperature to 29°C for two weeks to induce the knockdown or expression of hOPA1 variants. The results revealed no significant differences across the genotypes tested (Figure S4A–D).”

      In assessing the effects of dominant negative mutations, measurements including ROS levels, the ratio of Atg8-1 to Atg8-2, and the quantity of COXII protein were conducted, yet no significant differences were observed (Figure S6). This limitation of the fly model is mentioned in the results, noting the observation of the axonal degeneration phenotype but not alterations in ROS signaling, autophagy activity, or mitochondrial quantity as follows (line 287–290):

      “We investigated the impacts of dominant negative mutations on mitochondrial oxidation levels, mitochondrial quantity, and autophagy activation levels; however, none of these parameters showed statistical significance (Figure S6).”

      The reviewer also inquired about the effects of overexpressing and underexpressing OPA1 on axonal count and whether these effects are subject to a threshold. In response, we expressed both wild-type and variant forms of human OPA1 in Drosophila in vivo and assessed their protein levels using Western blot analysis. The results showed no significant differences in expression levels between the wild-type and variant forms in the OPA1 overexpression experiments, suggesting the absence of a variation threshold effect. These findings have been newly documented as quantitative data in Figure 5C. Furthermore, we have included a statement in the results section for Figure 6A, clarifying that overexpression of hOPA1 exhibited no discernible impact, as detailed on lines 274–276.

      “The results presented in Figure 5C indicate that there are no significant differences in the expression levels among the variants, suggesting that variations in expression levels do not influence the outcomes.”

      (6) Could the authors clarify what exons in Figure 5 are included in their transcript. My understanding is transcript NM_015560.3 contains exon 4,4b but not 5b. According to Song 2007 this transcript produces invariably s-OPA1 as it contains the exon 4b cleavage site. If this is true, this is a critical limitation in this study and in my opinion significantly undermines the likelihood of the proposed explanation of the findings presented in Figure 6. The primarily functional location of OPA1 is at the IMM and l-OPA1 is the primary opa1 isoform probably only that localizes here as the additional AA act as a IMM anchor. Given this is where GTPase likely oligomerizes the expression of s-OPA1 only is unlikely to interact anyway with native protein. I am not aware of any evidence s-OPA1 is involved in oligomerization. Therefore I don't think this method and specifically expression of a hOPA1 transcript which only makes s-OPA1 to be a reliable indicator of dominant negativity/interference with WT protein function. This could be checked by blotting UAS-hOPA1 protein with a OPA1 antibody specific to human OPA1 only and not to dOPA1. There are several available on the market and if the authors see only s-OPA1 then it confirms they are not expressing l-OPA1 with their hOPA1 construct.

      As suggested by the reviewer, we performed a Western blot using a human OPA1 antibody to determine if the expressed hOPA1 was producing the l-OPA1 isoform, as shown in band 2 of Figure 5D. The results confirmed the presence of both l-OPA1 and what appears to be s-OPA1 in bands 2 and 4, respectively. These findings are documented in the updated Figure 5D, with a detailed description provided in the manuscript at lines 224-226. Additionally, the NM_015560.3 refers to isoform 1, which includes only exons 4 and 5, excluding exons 4b and 5b. This isoform can express both l-OPA1 and s-OPA1 (refer to Figure 1 in Song et al., J Cell Biol. 2007). We have updated the schematic diagram in the figure to include these exons. The formation of s-OPA1 through cleavage occurs at the OMA1 target site located in exon 5 and the Yme1L target site in exon 5b of OPA1. Isoform 1 of OPA1 is prone to cleavage by OMA1, but a homologous gene for OMA1 does not exist in Drosophila. Although a homologous gene for Yme1L is present in Drosophila, exon 5b is missing in isoform 1 of OPA1, leaving the origin of the smaller band resembling s-OPA1 unclear at this point.

      Reviewer #2 (Public Review):

      The data presented support and extend some previously published data using Drosophila as a model to unravel the cellular and genetic basis of human Autosomal dominant optic atrophy (DOA). In human, mutations in OPA1, a mitochondrial dynamin like GTPase (amongst others), are the most common cause for DOA. By using a Drosophila loss-of-function mutations, RNAi- mediated knockdown and overexpression, the authors could recapitulate some aspects of the disease phenotype, which could be rescued by the wild-type version of the human gene. Their assays allowed them to distinguish between mutations causing human DOA, affecting the optic system and supposed to be loss-of-function mutations, and those mutations supposed to act as dominant negative, resulting in DOA plus, in which other tissues/organs are affected as well. Based on the lack of information in the Materials and Methods section and in several figure legends, it was not in all cases possible to follow the conclusions of the authors.

      We appreciate the reviewer's constructive feedback and the emphasis on enhancing clarity in our manuscript. We recognize the concerns raised about the lack of detailed information in the Materials and Methods section and several figure legends, which may have obscured our conclusions. In response, we have appended the detailed genotypes of the Drosophila strains used in each experiment to a supplementary table. Additionally, we realized that the description of 'immunohistochemistry and imaging' was too brief, previously referenced simply as “immunohistochemistry was performed as described previously (Sugie et al., 2017).” We have now expanded this section to include comprehensive methodological details. Furthermore, we have revised the figure legends to provide clearer and more thorough descriptions.

      Similarly, how the knowledge gained could help to "inform early treatment decisions in patients with mutations in hOPA1" (line 38) cannot be followed.

      To address the reviewer's comments, we have refined our explanation of the clinical relevance of our findings as follows. We believe this revision succinctly articulates the practical application of our research, directly responding to the reviewer’s concerns about linking the study's outcomes to treatment decisions for patients with hOPA1 mutations. By underscoring the model’s value in differential diagnosis and its influence on initiating treatment strategies, we have clarified this connection explicitly, within the constraints of the abstract’s word limit. The revised sentence now reads: "This fly model aids in distinguishing DOA from DOA plus and guides initial hOPA1 mutation treatment strategies."

      Reviewer #3 (Public Review):

      Nitta et al. establish a fly model of autosomal dominant optic atrophy, of which hundreds of different OPA1 mutations are the cause with wide phenotypic variance. It has long been hypothesized that missense OPA1 mutations affecting the GTPase domain, which are associated with more severe optic atrophy and extra-ophthalmic neurologic conditions such as sensorineural hearing loss (DOA plus), impart their effects through a dominant negative mechanism, but no clear direct evidence for this exists particularly in an animal model. The authors execute a well-designed study to establish their model, demonstrating a clear mitochondrial phenotype with multiple clinical analogs including optic atrophy measured as axonal degeneration. They then show that hOPA1 mitigates optic atrophy with the same efficacy as dOPA1, setting up the utility of their model to test disease-causing hOPA1 variants. Finally, they leverage this model to provide the first direct evidence for a dominant negative mechanism for 2 mutations causing DOA plus by expressing these variants in the background of a full hOPA1 complement.

      Strengths of the paper include well-motivated objectives and hypotheses, overall solid design and execution, and a generally clear and thorough interpretation of their results. The results technically support their primary conclusions with caveats. The first is that both dOPA1 and hOPA1 fail to fully restore optic axonal integrity, yet the authors fail to acknowledge that this only constitutes a partial rescue, nor do they discuss how this fact might influence our interpretation of their subsequent results.

      As the reviewer rightly points out, neither dOPA1 nor hOPA1 achieve a complete recovery. Therefore, we acknowledge that this represents only a partial rescue and have added the following explanations regarding this partial rescue in the results and discussion sections.

      Result:

      Significantly —> partially (lines 207 and 228) Discussion (lines 329–342):

      In our study, expressing dOPA1 in the retinal axons of dOPA1 mutants resulted in significant rescue, but it did not return to control levels. There are three possible explanations for this result. The first concerns gene expression levels. The Gal4-line used for the rescue experiments may not replicate the expression levels or timing of endogenous dOPA1. Considering that the optimal functionality of dOPA1 may be contingent upon specific gene expression levels, attaining a wild-type-like state necessitates the precise regulation of these expression levels. The second is a non-autonomous issue. Although dOPA1 gene expression was induced in the retinal axons for the rescue experiments, many retinal axons were homozygous mutants, while other cell types were heterozygous for the dOPA1 mutation. If there is a non-autonomous effect of dOPA1 in cells other than retinal axons, it might not be possible to restore the wild-type-like state fully. The third potential issue is that only one isoform of dOPA1 was expressed. In mouse OPA1, to completely restore mitochondrial network shape, an appropriate balance of at least two different isoforms, l-OPA1 and s-OPA1, is required (Del Dotto et al., 2017). This requirement implies that multiple isoforms of dOPA1 are essential for the dynamic activities of mitochondria.

      The second caveat is that their effect sizes are small. Statistically, the results indeed support a dominant negative effect of DOA plus-associated variants, yet the data show a marginal impact on axonal degeneration for these variants. The authors might have considered exploring the impact of these variants on other mitochondrial outcome measures they established earlier on. They might also consider providing some functional context for this marginal difference in axonal optic nerve degeneration.

      In response to the reviewer’s comment regarding the modest effect sizes observed, we acknowledge that the magnitude of the reported changes is indeed small. To explore the impact of these variants on additional mitochondrial outcomes as suggested, we employed markers such as MitoSOX, Atg8, and COXII for validation. However, we could not detect any significant effects of the DOA plus-associated variants using these methods. We apologize for the redundancy, but to address Reviewer #1's fifth question, we present experimental results showing that while the increased ROS signals observed upon dOPA1 knockdown were rescued by expressing human OPA1, the mutant variants 2708-2711del, D438V, and R445H did not ameliorate this effect. This outcome mirrors the axonal degeneration phenotype and is documented in Figure 5F. The following text has been added to the results section lines 248–254:

      “Furthermore, we assessed the potential for rescuing ROS signals. Similar to its effect on axonal degeneration, wild-type hOPA1 effectively mitigated the phenotype, whereas the 2708-2711del, D438V, and R445H mutants did not (Figure 5F). Importantly, the I382M variant also reduced ROS levels comparably to the wild type. These findings demonstrate that both axonal degeneration and the increase in ROS caused by dOPA1 downregulation can be effectively counteracted by hOPA1. Although I382M retains partial functionality, it acts as a relatively weak hypomorph in this experimental setup.”

      Moreover, utilizing mito-QC, we observed elevated mitophagy in our Drosophila model, with these results now included in Figure 2D–H. Given the complexity of the genetics involved and the challenges in establishing lines, autophagy activity was quantified by comparing the ratio of Atg8-1 to Atg8-2 via Western blot analysis. However, no significant alterations were detected across any of the genotypes. Additionally, mitochondrial protein levels derived from COXII confirmed consistent mitochondrial quantities, showing no considerable variance following knockdown. These insights affirm that retinal axon degeneration and mitophagy activation are present in the Drosophila DOA model, although the Western blot analysis revealed no significant changes in autophagy activation. Such findings necessitate caution as this model may not fully replicate the molecular pathology of the corresponding human disease. These Western blot findings are presented in Figure S4, with the following addition made to section lines 255–263 of the results:

      “We also conducted Western blot analyses using anti-COXII and anti-Atg8a antibodies to assess changes in mitochondrial quantity and autophagy activity following the knockdown of dOPA1. Mitochondrial protein levels, indicated by COXII quantification, were evaluated to verify mitochondrial content, and the ratio of Atg8a-1 to Atg8a-2 was used to measure autophagy activation. For these experiments, Tub-Gal4 was employed to systemically knockdown dOPA1. Considering the lethality of a whole-body dOPA1 knockdown, Tub-Gal80TS was utilized to repress the knockdown until eclosion by maintaining the flies at 20°C. After eclosion, we increased the temperature to 29°C for two weeks to induce the knockdown or expression of hOPA1 variants. The results revealed no significant differences across the genotypes tested (Figure S4A–D).”

      In assessing the effects of dominant negative mutations, measurements including ROS levels, the ratio of Atg8-1 to Atg8-2, and the quantity of COXII protein were conducted, yet no significant differences were observed (Figure S6). This limitation of the fly model is mentioned in the results, noting the observation of the axonal degeneration phenotype but not alterations in ROS signaling, autophagy activity, or mitochondrial quantity as follows (line 287–290):

      “We investigated the impacts of dominant negative mutations on mitochondrial oxidation levels, mitochondrial quantity, and autophagy activation levels; however, none of these parameters showed statistical significance (Figure S6).”

      Despite these caveats, the authors provide the first animal model of DOA that also allows for rapid assessment and mechanistic testing of suspected OPA1 variants. The impact of this work in providing the first direct evidence of a dominant negative mechanism is under-stated considering how important this question is in development of genetic treatments for DOA. The authors discuss important points regarding the potential utility of this model in clinical science. Comments on the potential use of this model to investigate variants of unknown significance in clinical diagnosis requires further discussion of whether there is indeed precedent for this in other genetic conditions (since the model is nevertheless so evolutionarily removed from humans).

      As suggested by the reviewer, we have expanded the discussion in our study to emphasize in greater detail the significance of the fruit fly model and the MeDUsA software we have developed, elaborating on the model's potential applications in clinical science and its precedents in other genetic disorders. Our text is as follows (lines 299–318):

      “We have previously utilized MeDUsA to quantify axonal degeneration, applying this methodology extensively to various neurological disorders. The robust adaptability of this experimental system is demonstrated by its application in exploring a wide spectrum of genetic mutations associated with neurological conditions, highlighting its broad utility in neurogenetic research. We identified a novel de novo variant in Spliceosome Associated Factor 1, Recruiter of U4/U6.U5 Tri-SnRNP (SART1). The patient, born at 37 weeks with a birth weight of 2934g, exhibited significant developmental delays, including an inability to support head movement at 7 months, reliance on tube feeding, unresponsiveness to visual stimuli, and development of infantile spasms with hypsarrhythmia, as evidenced by EEG findings. Profound hearing loss and brain atrophy were confirmed through MRI imaging. To assess the functional impact of this novel human gene variant, we engineered transgenic Drosophila lines expressing both wild type and mutant SART1 under the control of a UAS promoter.

      Our MeDUsA analysis suggested that the variant may confer a gain-of-toxic-function (Nitta et al.,  2023). Moreover, we identified heterozygous loss-of-function mutations in DHX9 as potentially causative for a newly characterized neurodevelopmental disorder. We further investigated the pathogenic potential of a novel heterozygous de novo missense mutation in DHX9 in a patient presenting with short stature, intellectual disability, and myocardial compaction. Our findings indicated a loss of function in the G414R and R1052Q variants of DHX9 (Yamada et al., 2023). This experimental framework has been instrumental in elucidating the impact of gene mutations, enhancing our ability to diagnose how novel variants influence gene function.”

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall I enjoyed reading this paper. It is well presented and represents a significant amount of well executed study. I feel it further characterizes a poorly understood model of OPA1 variants and one which displays significant differences with the human phenotype. However I feel the use of this model with the author's experiments are not enough to validate this model/experiment as a screening tool for dominant negativity. I have therefore suggested the above experiments as a way to both further validate the mitochondrial dysfunction in this model and to ensure that the expressed transcript is able affect oligomerization as this is a pre-requisite to the authors conclusions.

      We assessed the extent to which our model reflects mitochondrial dysfunction using COXII, Atg8, and MitoSOX markers. Unfortunately, neither COXII levels nor the ratio of Atg8a-1 to Atg8a-2 showed significant variations across genotypes that would clarify the impact of dominant negative mutations. Nonetheless, MitoSOX and mito-QC results revealed that mitochondrial ROS levels and mitophagy are increased in Drosophila following intrinsic knockdown of dOPA1. These findings are documented in Figures 2, 5, and S6.

      Regarding oligomer formation, the specifics remain elusive in this study. However, the expression of dOPA1K273A, identified as a dominant negative variant in Drosophila, significantly disrupted retinal axon organization, as detailed in Figure S7. From these observations, we hypothesize that oligomerization of wild-type and dominant negative forms in Drosophila results in axonal degeneration. Conversely, co-expression of Drosophila wild-type with human dominant negative forms does not induce degeneration, suggesting that they likely do not interact.

      Reviewer #2 (Recommendations For The Authors):

      Materials and Methods:

      The authors used GMR-Gal4 to express OPA1-RNAi. I) GMR is expressed in most cells in the developing eye behind the morphogenetic furrow. So the defects observed can be due to knock- down in support cells rather than in photoreceptor cells.

      We have added the following sentences in the result (lines 194–196)."The GMR-Gal4 driver does not exclusively target Gal4 expression to photoreceptor cells. Consequently, the observed retinal axonal degeneration could potentially be secondary to abnormalities in support cells external to the photoreceptors.”

      OPA1-RNAi: how complete is the knock-down? Have the authors tested more than one RNAi line?

      We conducted experiments with an additional RNAi line, and similarly observed degeneration in the retinal axons (Figure S2 A and B; lines 178–179).

      The loss-of-function allele, induced by a P-element insertion, gives several eye phenotypes when heterozygous (Yarosh et al., 2008). Does RNAi expression lead to the same phenotypes?

      A previous report indicated that the compound eyes of homozygous mutations of dOPA1 displayed a glossy eye phenotype (Yarosh et al., 2008). Upon knocking down dOPA1 using the GMR-Gal4 driver, we also observed a glossy eye-like rough eye phenotype in the compound eyes. These findings have been added to Figure S3 and lines 192–194.

      There is no description on the way the somatic clones were generated. How were mutant cells in clones distinguished from wild-type cells (e. g. in Fig. 4).

      In the Methods section, we described the procedure for generating clones and their genotypes as follows (lines 502–505): "The dOPA1 clone analysis was performed by inducing flippase expression in the eyes using either ey-Gal4 with UAS-flp or ey3.5-flp, followed by recombination at the chromosomal location FRT42D to generate a mosaic of cells homozygous for dOPA1s3475." Furthermore, we have created a table detailing these genotypes. In these experiments, it was not possible to differentiate between the clone and WT cells. Accordingly, we have noted in the Results section (lines 201–203): "Note that the mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them.”

      Why were flies kept at 29{degree sign}C? this is rather unusual.

      Increased temperature was demonstrated to induce elevated expression of GAL4 (Kramer and Staveley, Genet. Mol. Res., 2003), which in turn led to an enhanced expression of the target genes. Therefore, experiments involving knockdown assays or Western blotting to detect human OPA1 protein were exclusively conducted at 29°C. However, all other experiments were performed at 25°C, as described in the methods sections: “Flies were maintained at 25°C on standard fly food. For knockdown experiments (Figures 1C–E, 1F–H, 2A–H, 3B–K, 5F, S1, S2 A and B, and S6A), flies were kept at 29°C in darkness.” Furthermore, “We regulated protein expression temporally across the whole body using the Tub-Gal4 and Tub-GAL80TS system. Flies harboring each hOPA1 variant were maintained at a permissive temperature of 20°C, and upon emergence, females were transferred to a restrictive temperature of 29°C for subsequent experiments.”

      Legends:

      It would be helpful to have a description of the genotypes of the flies used in the different experiments. This could also be included as a table.

      We have created a table detailing the genotypes. Additionally, in the legend, we have included a note to consult the supplementary table for genotypes.

      Results:

      Line 141: It is not clear what they mean by "degradation", is it axonal degeneration? And if so, what is the argument for this here?

      In the manuscript, we addressed the potential for mitochondrial degradation; however, recognizing that the expression was ambiguous, the following sentence has been omitted: "Nevertheless, the degradation resulting from mitochondrial fragmentation may have decreased the mitochondrial signal.”

      Fig. 2: Axons of which photoreceptors are shown?

      We have added "a set of the R7/8 retinal axons" to the legend of Figure 2.

      Line 167: The authors write that axonal degeneration is more severe after seven days than after eclosion. Is this effect light-dependent? The same question concerns the disappearance of the rhabdomere (Fig. 3G–J).

      We conducted the experiments in darkness, ensuring that the observed degeneration is not light- dependent. This condition has been added to the methods section to clarify the experimental conditions.

      Line 178/179: Based on what results do they conclude that there is degeneration of the "terminals" of the axons?

      Quantification via MeDUsA has enabled us to count the number of axonal terminals, and a noted decrease has led us to conclude axonal terminal degeneration. We have published two papers on these findings. We have added the following description to the results section to clarify how we defined degeneration (lines 174–176): "We have assessed the extent of their reduction from the total axonal terminal count, thereby determining the degree of axonal terminal degeneration (Richard JNS 2022; Nitta HMG 2023).

      Line 189: They write: ".. we observed dOPA1 mutant axons...". How did they distinguish es mutant from the controls?

      Fig. 5 and Fig. 6: How did they distinguish genetically mutant cells from genetically control cells in the somatic clones?

      Mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them. Accordingly, this point has been added to lines 201–203, “Note that the mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them.” and the text in the results section has been modified as follows:

      (Before “To determine if dOPA1 is responsible for axon neurodegeneration, we observed the dOPA1 mutant axons by expressing full- length versions of dOPA1 in the photoreceptors at one day after eclosion and found that dOPA1 expression significantly rescued the axonal degeneration” —>

      (After “To determine if dOPA1 is responsible for axon neurodegeneration, we quantify the number of the axons in the dOPA1 eye clone fly with the expression of dOPA1 at one day after eclosion and found that dOPA1 expression partially rescued the axonal degeneration”

      Line 225/226: It is not clear to me how their approach "can quantitatively measure the degree of LOF".

      To address the reviewer's question and clarify how our approach quantitatively measures the degree of loss of function (LOF), we revised the statement (lines 238–247):

      "Our methodology distinctively facilitates the quantitative evaluation of LOF severity by comparing the rescue capabilities of various mutations. Notably, the 2708-2711del and I382M mutations demonstrated only partial rescue, indicative of a hypomorphic effect with residual activity. In contrast, the D438V and R445H mutations failed to show significant rescue, suggesting a more profound LOF. The correlation between the partial rescue by the 2708-2711del and I382M mutations and their classification as hypomorphic is significant. Moreover, the observed differences in rescue efficacy correspond to the clinical severities associated with these mutations, namely in DOA and DOA plus disorders. Thus, our results substantiate the model’s ability to quantitatively discriminate among mutations based on their impact on protein functionality, providing an insightful measure of LOF magnitude.”

      Discussion:

      Line 251, 252 and line 358: What is "the optic nerve" in the adult Drosophila?

      In humans, the axons of retinal ganglion cells (RGCs) are referred to as the optic nerve, and we posit that the retinal axons in flies are similar to this structure. In the introduction section, where it is described that the visual systems of flies and humans bear resemblance, we have appended the following definition (lines 107–108): “In this study, we defined the retinal axons of Drosophila as analogous to the human optic nerve.”

      Line 344: These bands appear only upon overexpression of the hOPA1 constructs, so this part of the is very speculative.

      Confirmation was achieved using anti-hOPA1, demonstrating that myc is not nonspecific. These results have been added to Figure 5D. Furthermore, the phrase “The upper band was expected as” has been revised to “From a size perspective, the upper band was inferred to represent the full-length hOPA1 including the mitochondria import sequence (MIS).” (lines 464–465)

      I was missing a discussion about the increase of ROS upon loss/reduction of dOPA1 observed by others and described here. Is there an increase of ROS upon expression of any of the constructs used?

      We demonstrated that not only axonal degeneration but also ROS can be suppressed by expressing human OPA1 in the genetic background of dOPA1 knockdown. Additionally, rescue was not possible with any variants except for I382M. Furthermore, we assessed whether there were changes in ROS in the evaluation of dominant negatives, but no significant differences were observed in this experimental system. These findings have been added to the discussion section as follows (lines 318–328). “Our research established that dOPA1 knockdown precipitates axonal degeneration and elevates ROS signals in retinal axons. Expression of human OPA1 within this context effectively mitigated both phenomena; it partially reversed axonal degeneration and nearly completely normalized ROS levels. These results imply that factors other than increased ROS may drive the axonal degeneration observed post-knockdown. Furthermore, while differences between the impacts of DN mutations and loss-of- function mutations were evident in axonal degeneration, they were less apparent when using ROS as a biomarker. The extensive use of transgenes in our experiments might have mitigated the knockdown effects. In a systemic dOPA1 knockdown, assessments of mitochondrial quantity and autophagy activity revealed no significant changes, suggesting that the cellular consequences of reduced OPA1 expression might vary across different cell types.”

      Reviewer #3 (Recommendations For The Authors):

      Consider being more explicit regarding literature that has or has failed to test a direct dominant negative effect by expressing a variant in question in the background of a full OPA1 complement. My understanding is that this is the first direct evidence of this widely held hypothesis. This lends to the main claim promoting the utility of fly as a model in general. The authors might also outline this in the introduction as a knowledge gap they fill through this study.

      In the introduction, we have incorporated a passage that highlights precedents capable of distinguishing between LOF and DN effects, and we note the absence of models capable of dissecting these distinctions within an in vivo organism. This study aims to address this gap, proposing a model that elucidates the differential impacts of LOF and DN within the context of a living model organism, thereby contributing to a deeper understanding of their roles in disease pathology. We added the following sentences in the introduction (lines 71–80).

      “In the quest to differentiate between LOF and DN effects within the context of genetic mutations, precedents exist in simpler systems such as yeast and human fibroblasts. These models have provided valuable insights into the conserved functions of OPA1 across species, as evidenced by studies in yeast models (Del Dotto et al., 2018) and fibroblasts derived from patients harboring OPA1 mutations (Kane et al., 2017). However, the ability to distinguish between LOF and DN effects in an in vivo model organism, particularly at the structural level of retinal axon degeneration, has remained elusive. This gap underscores the necessity for a more complex model that not only facilitates molecular analysis but also enables the examination of structural changes in axons and mitochondria, akin to those observed in the actual disease state.”

      The authors should clarify the language used in the abstract and introduction on the effect of hOPA1 DOA and DOA plus on the dOPA1- phenotype. Currently written as "none of the previously reports mutations known to cause DOA or DOA plus were rescued, their functions seems to be impaired." but presumably the authors mean that these variants failed to rescue to the dOPA1 deficient phenotype.

      We thank the reviewer for the constructive feedback. We acknowledge the need for clarity in our description of the effects of hOPA1 DOA and DOA plus mutations on the dOPA1- phenotype in both the abstract and the introduction. The current phrasing, "none of the previously reported mutations known to cause DOA or DOA plus were rescued, their functions seem to be impaired," may indeed be confusing. To address your concern, we have revised this statement to more accurately reflect our findings: "Previously reported mutations failed to rescue the dOPA1 deficiency phenotype." For Abstract site, we have changed as following. "we could not rescue any previously reported mutations known to cause either DOA or DOA plus.”→ “mutations previously identified did not ameliorate the dOPA1 deficiency phenotype.”

      DOA plus is associated with a multiple sclerosis-like illness; as written it suggests that the pathogenesis of sporadic multiple sclerosis and that associated with DOA plus share and underlying pathogenic mechanism. Please use the qualifier "-like illness." 

      We have added the term “multiple sclerosis-like illness” wherever “multiple sclerosis” is mentioned.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors want to elucidate which are the mechanisms that regulate the immune response in physiological conditions in cortical development. To achieve this goal, authors used a wide range of mutant mice to analyse the consequences of immune activation in the formation of cortical ectopia in mice.

      Strengths:

      The authors demonstrated that Abeta monomers are anti-inflammatory and inhibit microglial activation. This is a novel result that demonstrates the physiological role of APP in cortical development.

      Weaknesses:

      -On the other hand, cortical ectopia has been already described in mouse models in which the amyloid signalling has been disrupted (Herms et al., 2004; Guenette et al., 2006), making the current study less novel.

      We agree these previous studies have implicated amyloid precursor protein in cortical ectopia. However, since these studies use whole-body knockouts, they have not implicated the functional roles of specific cell types.  Nor have they identified the specific mechanisms underlying the formation of this unique class of cortical ectopia. In contrast, our studies show that the disruption of a novel Abeta-regulated signaling pathway in microglia is the primary cause of ectopia formation in this class of ectopia mutants. This is the first time that microglia have been specifically implicated in the development of cortical ectopia. We further show that elevated MMP activity and resulting cortical basement membrane degradation is the underlying mechanism leading to ectopia formation.  This is also the first time that MMP activity and basement membrane degradation (instead of maintenance) have been implicated in cortical ectopia development. As such, our results have provided novel insights into the diverse mechanisms underlying cortical ectopia formation in developmental brain disorders.

      One of the molecules analysed is Ric8a, a GTPase activator involved in neuronal development. Authors used the conditional mutant mice Emx1-Ric8a to delete Ric8a from early progenitors and glutamatergic neurons in the pallium. Emx1-Ric8a mutant mice present cortical ectopias and authors attributed this malformation to the increase in inflammatory response due to Ric8a deletion in microglia. Several discordances do not fit this interpretation:

      -The role of Ric8a in cortical development and function has been already described in several papers, but none of them has been cited in the current manuscript (Kask et al., 2015, 2018; Ruisu et al., 2013; Tonissoo et al., 2006).

      We will include reference to these publications in revision.

      -Ectopia formation in the cortex has been already described in Nestin-Ric8a cKO mice (Kask et al., 2015). In the current manuscript, authors analyzed the same mutant mice (Nestin-Ric8a), but they did not detect any ectopia. Authors should discuss this discordance.

      The expression pattern of nestin-cre is known to vary dependent on factors including transgene insertion site, genetic background, and sex. Early studies show, for example, that the nestin gene promoter drives cre expression in many non-neural tissues in another transgenic line in the FVB/N genetic background (Dubois et al Genesis. 2006 Aug;44(8):355-60. doi: 10.1002/dvg.20226).  The specific nestin-cre line used in Kask et al 2015 has also been shown to be active in brain microglia and lead to increased microglia pro-inflammatory activity upon breeding to a conditional allele of a cholesterol transporter gene (Karasinska et al., Neurobiol Dis. 2013 Jun:54:445-55; Karasinska et al.,  J Neurosci. 2009 Mar 18; 29(11): 3579–3589). The ectopia reported in Kask et al 2015 are also significantly more subtle than what we have observed and apparently not observed in all mutant animals (we observe severe ectopia in every single emx1-cre mutant).  We presume the ectopia reported in Kask et al 2015 may result from a combined deletion of ric8a gene from microglia and neural cells due to unique combinations of factors affecting nestin-cre expression in a subset of mutants.

      -Authors claim that microglia express Emx1, and therefore, Ric8a is deleted in microglia cells. However, the arguments for this assumption are very weak and the evidence suggests that this is not the case. This is an important point considering that authors want to emphasise the role of Ric8a in microglia activation, and therefore, additional experiments should demonstrate that Ric8a is deleted in microglia in Emx1-Ric8a mutant mice.

      We have observed altered mRNA expression of several genes in purified microglia cultured from the emx1-cre mutants (Supplemental Fig. 8), which indicates that ric8a is deleted from microglia and suggests a role of microglial ric8a deficiency in ectopia formation.  This interpretation is further strengthened by the observation that deletion of ric8a from microglia using a microglia-specific cx3cr1-cre results in similar ectopia (Fig. 2). We also have other data supporting this interpretation, including data showing induction of the expression of a cre reporter in brain microglia by emx1-cre and loss of ric8a gene expression in microglia cells isolated from emx1-cre mutants. We will include these data in revision.

      Reviewer #2 (Public Review):

      Kwon et al. used several conditional KO mice for the deletion of ric8a or app in different cell types. Some of them exhibited pial basement membrane breaches leading to neuronal ectopia in the neocortex.

      They first investigated ric8a, a Guanine Nucleotide Exchange Factor for Heterotrimeric G Proteins. They observed the above-mentioned phenotype when ric8a is deleted from microglia and neural cells (ric8a-emx1-cre or dual deletion with cre combination cx3cr1 (in microglia) and nestin (in neural cells)) but not in microglia alone or neural cells alone (whether it is in CR cells (ric8a-Wnt3a-cre), post-mitotic neurons (nex-cre or dlx5/6-cre), or in progenitors and their progeny (nestin-cre or foxg1-cre). They also show that ric8a KO mutant microglia cells stimulated in vitro by LPS exhibit an increased TNFa, IL6 and IL1b secretion compared to controls (Fig 2). They therefore injected LPS in vivo and observed the neuronal ectopia phenotype in the ric8a-cx3cr1-cre (microglial deletion) cortices at P0 (Fig 2). They suggest that ric8a KO in neuronal cells mimics immune stimulation (but we have no clue how ric8a KO in neural cells would induce immune stimulation).

      We agree we do not currently know the precise mechanisms by which mutant microglia are activated in the mutant brain.  However, this does not affect the conclusion that deficiency in the Abeta monomer-regulated APP/Ric8a pathway in microglia is the primary cause of cortical ectopia in these mutants, since we have shown that genetic disruption of this pathway in microglia alone by different means targeting different pathway components, using cell type specific cre, all results in similar cortical ectopia phenotypes.  Regarding the source of the immunogens, there are several possibilities which we plan to investigate in future studies. For example, the clearance of apoptotic cells and associated cellular debris is an important physiological process and deficits in this process have been linked to inflammatory diseases throughout life (Doran et al., Nat Rev Immunol. 2020 Apr;20(4):254-267; Boada-Romero et al., Nat Rev Mol Cell Biol. 2020 Jul;21(7):398-414.).  In the embryonic cortex, studies have shown that large numbers of cell death take place starting as early as E12 (Blaschke et al., Development. 1996 Apr;122(4):1165-74; Blaschke et al., J Comp Neurol. 1998 Jun 22;396(1):39-50).  Studies have also shown that radial glia and neuronal progenitors play critical roles in the clearance of apoptotic cells and associated cellular debris in the brain (Lu et al., Nat Cell Biol. 2011 Jul 31;13(9):1076-83; Ginisty et al., Stem Cells. 2015 Feb;33(2):515-25; Amaya et al., J Comp Neurol. 2015 Feb 1;523(2):183-96). Moreover, Ric8a-dependent heterotrimeric G proteins have been found to specifically promote the phagocytic activity of both professional and non-professional phagocytic cells (Billings et al., Sci Signal. 2016 Feb 2;9(413):ra14; Preissler et al., Glia. 2015 Feb;63(2):206-15; Pan et al. Dev Cell. 2016 Feb 22;36(4):428-39; Flak et al. J Clin Invest. 2020 Jan 2;130(1):359-373; Zhang et al., Nat Commun. 2023 Sep 14;14(1):5706).  Thus, it is likely that the failure to promptly clear up apoptotic cells and debris by radial glia may play a role in the triggering of microglial activation in ric8a mutants. We have not included discussion of these possibilities since the precise mechanisms remain to be determined.  Moreover, they also do not impact the conclusion of the current study.

      The authors then turned their attention on APP. They observed neuronal ectopia into the marginal zone when APP is deleted in microglia (app-cxcr3-cre) + intraperitoneal LPS injection (they did not show it, but we have to assume there would not be a phenotype without the injection of LPS) (Fig 3). (The phenotype is similar but not identical to ric8a-cx3cr1-cre + LPS. They suggest that the reason is because they had to inject 3 times less LPS due to enhanced immune sensitivity in this genetic background but it is only a hypothesis). After in vitro stimulation by LPS, app mutant microglia show a reduced secretion of TNFa and IL6 but not IL1b (this is the opposite to ric8a-cx3cr1-cre microglia cells) while peritoneal macrophages in culture show increased secretion of TNFa, IL1, IL6 and IL23 (fig 3 and Suppl. Fig 9).

      We have data showing that that app-cxcr3-cre mutants without LPS injection do not show ectopia and will include them in revision.  The reason we employ LPS injection is, in the first place, we do not see a phenotype without the injection. We agree, and have also stated in the text, that the phenotype of the app mutants is not as severe as that of the ric8a mutant.  Besides the low LPS dosage used, we also suggest that other app family members may compensate since the ectopia in the app family gene mutants reported previously were only observed in app/aplp1/2 triple knockouts, not even in any of the double knockouts (Herms et al., 2004). These potential causes are also not mutually exclusive. Nonetheless, the microglia specific app mutants clearly show ectopia upon immune stimulation, implicating a role of microglial APP in cortical ectopia formation.

      The distinct response of ric8a and app microglia to LPS results from in vitro culturing of microglia. Indeed, we have shown that, when acutely isolated macrophages are used, these mutants show changes in the same direction (both increased cytokine secretion).  The microglia used for analysis in this study have all been cultured in vitro for two weeks before assay. They have thus been under chronic stimulation exposing to dead cells and debris in the culture dish through this period.  Dependent on the degree of perturbation to inflammation-regulating pathways, such exposures are known to significantly change microglial cytokine expression, sometimes in an opposite direction from expected.  For example, under chronic immune stimulation, while the trem2+/- microglia, which are heterozygous mutant for the anti-inflammatory Trem2, show elevated pro-inflammatory cytokine expression as expected, trem2-/- (null) microglia under the same conditions instead not only do not show increases but for some pro-inflammatory cytokines, actually show decreases in expression (Sayed et al.,, Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10172-10177).  In several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  It is likely that in microglia Ric8a-dependent heterotrimeric G proteins may also mediate only a subset of the signaling downstream of APP.  As such, app knockout in microglia may have more severe effects than ric8a knockout on microglial immune activation and lead to changes in the opposite direction compared to ric8a knockout, as has been observed for trem2 null mutation vs heterozygosity discussed above. This may explain the subdued TNF and IL6 secretion by cultured app mutant microglia.

      Amyloid beta (Ab) being one of the molecules binding to APP, the authors showed that Ab40 monomers (they did not test Ab40 oligomers) partially inhibit cytokines (TNFa, IL6, IL1b, MCP-1, IL23a, IL10) secretion in vitro by microglia stimulated by LPS but does not affect secretion by microglia from app-cx3cr1-cre (tested for TNFa, IL6, IL1b, IL23a, IL10) (Fig 4, Suppl fig 10) (but still does it in aplp2-cx3cr1-cre) and does not affect secretion by ric8a-cx3cr1-cre microglia (tested for TNFa and IL6 but still suppress IL1b) (Therefore here is another difference between app and ric8a KO microglia).

      We have tested the effects of Abeta40 oligomers, which induce instead of suppressing microglial cytokine secretion, and will include the data in revision.  As mentioned above, in several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  We assume that this is likely also true in microglia and that Ric8a-dependent heterotrimeric G proteins may mediate only a subset of the signaling downstream of APP.  This may explain the difference in the effects of APP and ric8a knockout mutation in abolishing the anti-inflammatory effects of Abeta monomers on IL-1b vs TNF/IL-6.  It also suggests that TNF/IL-6 and IL-1b secretion must be regulated by different mechanisms. Indeed, it is well established in immunology that the secretion of IL1b, but not of TNF or IL6, is regulated by inflammasome-dependent mechanisms (see, for example, Proz & Dixit. Nat Rev Immunol. 2016 Jul;16(7):407-20. doi: 10.1038/nri.2016.58).

      The authors injected inhibitors of Akt or Stat3 in the ric8a-emx1-cre cortex and found it suppressed neuronal ectopia (Fig 5, Suppl fig 11). It is not clear whether it suppresses immune stimulation from neuronal cells or immune reaction from microglia cells.

      We agree at present the pharmacological approaches we have taken are not able to distinguish these possibilities.  However, whichever of these possibilities turns out to be the case would still implicate a role of excessive microglial activation in the formation of cortical ectopia and support the conclusion of the study.  Thus, while potentially worthwhile of further investigation, this question does not impact the conclusion of this study. Furthermore, as mentioned, we plan to determine the mechanisms of how ric8a mutation in neural cells induces immune activation in future studies. These results will likely enable us to adopt more specific approaches to address this question.

      Finally, the authors examined the activities of MMP2 and MMP9 in the developing cortex using gelatin gel zymography. The activity and protein levels of MMP9 but not MMP2 in the ric8a-emx1-cre cortex were claimed significantly increased (Fig 5, Suppl fig 12). Unfortunately, they did not show it in the app-cx3cr1-cre +LPS mouse. They make a connection between ric8a deletion and MMP9 but unfortunately do not make the connection between app deletion and MMP9, which is at the center of the pathway claimed to be important here). Then they injected BB94, a broad-spectrum inhibitor of MMPs or an inhibitor specific for MMP9 and 13. They both significantly suppress the number and the size of the ectopia in ric8a mutants (Fig5).

      For all the gelatin gel zymography analysis, we quantify protein concentrations in the cortical lysates using the Bio-Rad Bradford assay kit and load the same amounts of proteins per lane. The results across lanes are thus directly comparable. From the quantification, our results clearly show that MMP9, but not MMP2, levels are increased in the mutants (supplemental Figure 12).  The data on MMP2 also provide an internal control further supporting the observation of a specific change in MMP9.  For this analysis, we focus on the ric8a-emx1-cre mutants since the app-cx3cr1-cre +LPS animals show less severe, more localized ectopia and in most cases only in one of the hemispheres.  Any changes in MMP9 are therefore likely to be masked and the experiments unlikely to yield meaningful results.  On the other hand, we have clearly shown that the administration of different classes of MMP inhibitors significantly eliminate ectopia in ric8a-emx1-cre mutants. This has strongly implicated a functional contribution of MMPs.

      After reading the manuscript, I still do not know how ric8a in neural cells is involved in the immune inhibition. Is it through the control of Ab monomers? In addition, the authors did not show in vivo data supporting that Ab monomers are the key players here. As the authors said, this is not the only APP interactor. Finally, I still do not know how ric8a is linked to APP in microglia in the model.

      As detailed above, there are several possibilities including potential deficits in the clearance of apoptotic cells and associated debris that may trigger microglial activation in ri8ca-emx1-cre mutants. We will investigate these possibilities in future studies.  We have not included discussion since their roles remain to be determined.  As for the role of Abeta monomers, we have indicated that we currently do not have evidence that in the developing cortex Abeta monomers play a role in inhibiting microglia.  We have also indicated in the manuscript that our conclusion is that an Abeta monomer-activated microglial pathway regulates normal brain development, not that Abeta monomers themselves regulate brain development.  Regarding the link between Ric8a and APP, the reviewer has missed several major lines of supporting evidence. For example, we have shown that Abeta monomers activates a pathway in microglia that inhibits the secretion of several proinflammatory cytokines including TNF, IL-6, IL-10, and IL-23 (Figure 4 and Supplemental Figures 8-10).  This inhibition is abolished when either app or ric8a gene is deleted from microglia.  This indicates that app and ric8a act in the same pathway activated by Abeta monomers in microglia. We also show that this Abeta monomer-activated pathway also inhibits the transcription of several cytokines in microglia.  This inhibition is also abolished when either app or ric8a gene is deleted from microglia.  This reinforces the conclusion that app and ric8a act in the same pathway in microglia.  Furthermore, cell type specific deletion of app or ric8a from microglia in vivo also results in similar phenotypes of cortical ectopia. Together, these results thus strongly support the conclusion that app and ric8a act in the same pathway activated by Abeta monomers in microglia. This conclusion is also consistent with published findings that Ric8a dependent heterotrimeric G proteins bind to APP and mediate subsets of APP signaling across different different species (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).         

      While several of the findings presented in this manuscript are of potential interest, there are a number of shortcomings. Here are some suggestions that could improve the manuscript and help substantiate the conclusions:

      (1) As the title suggests it, the focus is on Ab and APP functions in microglia. However, the analysis is more focused on ric8a. The connection between ric8a and APP in this study is not investigated, besides the fact that their deletion induces somewhat similar but not identical phenotypes. Showing a similar phenotype is not enough to conclude that they are working on the same pathway. The authors should find a way to make that connection between ric8a and app in the cells investigated here.

      As discussed above, the reviewer misses several major lines of evidence showing that APP and Ric8a acts in the same pathway in microglia.  For example, besides the similarity of the ectopia phenotypes, we have shown that Abeta monomers activates a pathway in microglia that inhibits the secretion of several proinflammatory cytokines including TNF, IL-6, IL-10, and IL-23 (Figure 4 and Supplemental Figures 8-10).  These inhibitory effects are completely abolished when either app or ric8a gene is deleted from microglia.  This indicates that app and ric8a act in the same pathway activated by Abeta monomers in microglia. We also show that this Abeta monomer-activated pathway inhibits the transcription of several cytokine genes in microglia.  These effects are again completely abolished when either app or ric8a gene is deleted from microglia.  This further reinforces the conclusion that app and ric8a act in the same pathway in microglia.  Not only so we also show that the same results are true in macrophages.  Together, these results therefore strongly support the conclusion that app and ric8a act in the same pathway in microglia. This conclusion is also consistent with published findings that Ric8a dependent heterotrimeric G proteins bind to APP and mediate APP signaling across different species (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).

      (2) This would help to show the appearance of breaches in the pial basement membrane leading to neuronal ectopia; to investigate laminin debris, cell identity, Wnt pathway for app-cxcr3-cre + LPS injection as you did for ric8a-emx1-cre.

      We will provide further data on the breaches in the pial basement membrane.  We have not observed any changes in cell identity or Wnt pathway activity in ric8a-emx1-cre mutants. The ectopia phenotype in the app-cxcr3-cre + LPS animals is also less severe.  It is therefore likely of limited value to examine potential changes in these areas.

      (3) As a control, this would help to show that app-cxcr3-cre without the LPS injection does not display the phenotype.

      We have the data on app-cx3cr1-cre mutants without LPS injection, which show no ectopia, and will include the data in revision.

      (4) This would help to show the activity and protein levels of MMP9 and MMP2 and perform the rescue experiments with the inhibitors in the app-cx3cr1-cre cortex +LPS.

      As discussed above, we focus analysis on the ric8a-emx1-cre mutants since app-cx3cr1-cre +LPS animals show less severe, more localized ectopia and in most cases only in one of the hemispheres.  Determining potential changes in MMP9 levels and effects of MMP inhibitors are therefore not likely to yield useful data.  On the other hand, we have shown that MMP9 levels are increased and administration of different classes of MMP inhibitors eliminate cortical ectopia in ric8a-emx1-cre mutants.  This has strongly implicated a functional contribution of MMPs.

      (5) Is MMP9 secreted by microglia cells or neural cells?

      Our in situ hybridization data show MMP9 is most highly expressed in macrophage-like cells in the embryonic cortex, suggesting that microglia may be a major source of MMP9. We will incorporate these data in revision.

      (6) The in vitro evidence indicates that one of the multiple APP interactors, ie Ab40 monomers, is less effective in suppressing the expression of some cytokines by microglia cells mutants for ric8a (TNFa and IL6 but still suppress IL1b) or APP (TNFa, IL6, IL1b, IL23a, IL10) when compared to WT. But there are other interactors for APP. In order to support the claim, it seems crucial to have in vivo data to show that Ab40 monomers are the molecules involved in preventing the breach in the pial basement membrane.

      As addressed in detail above, we have indicated that our conclusion is that an Abeta monomer-activated microglial pathway regulates normal brain development, not that Abeta monomers themselves regulate brain development.  We currently do not have evidence that the Abeta monomers play a role in inhibiting microglia in the developing cortex.  There are candidate ligands for the pathway in the developing cortex, the functional study of which, however, is a major undertaking and beyond the scope of the current study.

      (7) In order to claim that this is specific to Ab40 monomers and not oligomers, it is necessary to show that the Ab40 oligomers do not have the same effect in vitro and in vivo. Also, an assay should be done to show that your Ab preparations are pure monomers or oligomers.

      We have tested the effects of Abeta40 oligomers, which induce instead of suppressing microglial cytokine secretion, and will include the data in revision. The protocols we use in preparing the monomers and oligomers are standard protocols employed in the field of Alzheimer’s disease research and have been optimized and validated repeatedly over the past several decades.  

      (8) Most of the cytokine secretion assays used microglia cells in culture. Two results draw my attention. Ric8a deletion increases TNFa and IL6 secretion after LPS stimulation in vitro on microglia cells while app deletion decreases their secretion. Then later, papers show that the decrease in IL1b induced by Ab on microglia cells is prevented by APP deletion but not ric8a deletion. Those two pieces of data suggest that ric8a and APP might not be in the same pathway. In addition, the phenotype from app-cxcr3-cre + LPS injection and ric8a-cxcr3-cre + LPS injection are not exactly the same. It could be due to the level of LPS as the author suggests or it might not be. More experiments are needed to prove they are in the same pathway.

      As discussed above, the reviewer misses several major lines of evidence, which strongly support the conclusion that APP and Ric8a act in the same pathway activated by Abeta monomers in microglia (see detailed discussion in point 1).  The differential response of app and ric8a mutant microglia likely results from chronic immune stimulation during in vitro culturing, which is known to alter microglia cytokine expression (see detailed discussion in point 9 below on how chronic immune stimulation changes microglial cytokine expression). We have demonstrated this by showing that, without culturing, acutely isolated app and ric8a mutant macrophages both display elevated cytokine secretion (Figure 4).  Regarding the distinct regulation of TNF/IL-6 and IL-1b by APP and Ric8a, as discussed above, in several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  It is likely this is also the case in microglia and Ric8a-dependent heterotrimeric G proteins may mediate only a subset of the anti-inflammatory signaling activated by APP.  As such, this may explain why app, but ric8a, mutation abolishes the inhibitory effects of Abeta monomers on IL-1b.  This also suggests that the secretion of TNF/IL-6 and IL-1b must be regulated by different mechanisms. Indeed, it is well established in immunology that the secretion of IL1b, but not that of TNF or IL6, is regulated by inflammasome-dependent mechanisms (see, for example, Proz & Dixit. Nat Rev Immunol. 2016 Jul;16(7):407-20. doi: 10.1038/nri.2016.58).

      (9) How do the authors reconcile the reduced TNFa and IL6 secretion upon stimulation of app mutant microglia with the model where app is attenuating immune response in vivo? Line 213 says that microglia exhibit attenuated immune response following chronic stimulation but I don't know if 3 hours of LPS in vitro is a chronic stimulation.

      The reviewer has misunderstood.  The microglia used in this study have all been cultured in vitro for approximately two weeks before assay. They have thus been under chronic stimulation exposing to dead cells and debris in the culture dish throughout this period.  Dependent on the degree of perturbation to inflammation-regulating pathways, such exposures are known to significantly change microglial cytokine expression, sometimes in an opposite direction than expected.  For example, under chronic immune stimulation, while the trem2+/- microglia, which are heterozygous mutant for the anti-inflammatory Trem2, show elevated pro-inflammatory cytokine expression as expected, trem2-/- (null) microglia under the same conditions instead not only do not show increases but for some pro-inflammatory cytokines, actually show decreases in expression (Sayed et al.,, Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10172-10177).  As mentioned, in several systems, Ric8a-dependent heterotrimeric G proteins have also been shown to bind to APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  It is likely that Ric8a-dependent heterotrimeric G proteins also mediate only a subset of the anti-inflammatory signaling activated by APP in microglia.  As such, app knockout in microglia may have more severe effects than ric8a knockout on microglial immune activation, similar to the relationship between trem2 null mutation vs heterozygosity discussed above. This likely explains why TNF and IL6 secretion by cultured app mutant microglia is subdued.  In contrast, we find that acutely isolated app mutant macrophages show increased cytokine secretion. This is likely more representative of the response of app mutant microglia in the absence of chronic stimulation.

      (10) Line 119: In their model, the authors suggest that there is a breach in pial basement membrane but that the phenotype is different from the retraction of the radial fibers due to reduced adhesion. So, could the author discuss to what substrate the radial fibers are attached to, in their model where the pial surface is destroyed?

      Radial glial endfeet normally bind to the basement membrane via cell surface receptors including the integrin and the dystroglycan protein complexes. We observe free radial glial endfeet at the breach sites, apparently without attachment to any basement membrane.  However, we cannot exclude the possibility that there may be residual basement components not detected by the methodology employed. 

      (11) The authors should show that the increased cytokine secretion observed in vitro is also happening in vivo in ric8a-emx1-cre compared to WT mice and compared to ric8a-nestin-cre mice. Or when app is deleted in microglia (app-cxcr3-cre) + LPS injection compared to WT mice +LPS.

      Unfortunately, this is not technically feasible since it is impossible to extract the extracellular (secreted) fractions of cytokines from an embryonic brain without causing cell lysis and the release of the intracellular pool.  This, however, does not affect our conclusion that the Abeta monomer-regulated microglia pathway plays a key role in regulates normal brain development since its genetic disruption, by different approaches, clearly results in brain malformation.

      (12) The authors injected inhibitors of Akt or Stat3 in the ric8a-emx1-cre cortex and found that it suppressed neuronal ectopia (Fig 5, Suppl fig 11). Does it suppress immune stimulation from neuronal cells or immune reaction from microglia cells?

      As discussed above, we agree at present the pharmacological approaches we have taken are not able to distinguish these two possibilities.  However, no matter which possibility is true, it does not affect our conclusion.  Furthermore, we also plan to determine the mechanisms of how ric8a mutation in neural cells induce immune activation in future studies. These results will likely enable us to adopt specific approaches to address this question.

      (13) Fig 5 and Supplementary fig 12: Please show a tubulin loading control in Fig 5i as you did in suppl fig 12 d (gel zymography). Please provide a gel zymography showing side by side Control, mutant and mutant +DM/S3I treatment. The same request for the MMP9 staining. Please provide statistics for control vs mutant for suppl fig 12c and d.

      For all experiments of the gelatin gel zymography analysis, we quantify protein concentrations in the cortical lysates using the Bio-Rad Bradford assay kit and load the same amounts of proteins per lane. The results across lanes are thus all comparable.  These experiments were also performed several years ago before the pandemic and we unfortunately no longer have the samples.  We will, however, provide the protein quantification information in revision.  The MMP9 staining images for the controls and mutants have also all been taken with the same parameters on the microscope and can be directly compared.  The statistics will be provided as suggested.

      (14) Please provide the name and the source of the MMP9/13 inhibitor used in this study.

      This inhibitor is MMP-9/MMP-13 inhibitor I (CAS 204140-01-2), from Santa Cruz Biotechnology. This information will be included in revision.

      (15) The results show that deletion of ric8a in microglia and neural cells induced pia membrane breaches but no phenotype is apparent in ric8a deletion in microglia or neural cells alone. Then, the results showed that intraperitoneal injection of LPS induced the phenotype in ric8a-cxcr3-cre mutants. It would be beneficial as a control supporting the model to show that the insult induced by LPS injection does not induce the phenotype in the ric8a-foxg1-cre mice.

      We agree it may potentially be useful to show that LPS injection does not induce ectopia in ric8a-foxg1-cre mice.  Unfortunately, since the ric8a-foxg1-cre mutation shows no phenotype, we are no longer in possession of this line.

      Reviewer #1 (Recommendations For The Authors):

      -The information in the abstract and the introduction is only related to app. So, it is very abrupt how authors start the manuscript studying the role of Ric8a, with no information at all about this protein and why the authors want to investigate this role in microglial activation. Later in the manuscript, the authors tried to link Ric8a with app to study the role of app in the inflammatory response and ectopia formation. This link is quite weak as well.

      In the last paragraph of the Introduction, we explain the use of the ric8a mutant and how it leads to discovery of the Abeta monomer-regulated pathway. We will improve the writing in revision to make these points clearer.  We will also improve the writing of the potential link of Ric8a to APP by highlighting, especially, the fact that ric8a and app pathway mutants are among a unique group of only three mouse mutants (ric8a, app/aplp1/2, and apbb1/2) that show cortical ectopia exclusively in the lateral cortex, while all other cortical ectopia mutants show the most severe ectopia are at the midline.

      -In order to validate the mouse model, double immunofluorescence or immunofluorescence+in situ hybridization should be performed to show that microglia express ric8a and that is eliminated in the Emx1-Ric8a mutant mice.

      As mentioned above, we have additional lines of evidence showing that ric8a is deleted from microglia in emx1-cre mutants. This includes data showing induction of the expression of a cre reporter in brain microglia by emx1-cre and loss of ric8a gene expression in microglia cells isolated from emx1-cre mutants.  We will include these data in revision.

      -In Supplemental Fig. 6, the authors claimed that cell proliferation is normal in Ric8a mutant mice without doing any quantification. They also quantified the angle of mitotic division of progenitors in the ventricular zone, but there are no images for the spindle orientation quantification, and no description of how they did it. In addition, this data is contrary to what has already been published in conditional Ric8a mutant mice (Kask et al., 2015). The Vimentin staining should be improved.

      We will provide quantification of cell proliferation in revision. We will also provide details on the quantification on mitotic spindle orientation.  We are not sure why the results are different from the other study. We were indeed anticipating deficits in mitotic spindle orientation and spent major efforts in the analysis.  However, based on the data, we could not draw the conclusion.

      -Analysis of the MMP9 expression should be done by western blot and not by immunofluorescence. In fact, the MMP9 expression shown in Figure 5g,h, does not correspond with RNA expression shown in gene expression atlas like genepaint or the allen atlas, doubting the specificity of the antibody. The expression of Mmp9 is quite low or absent in the cortex at E13.5-E14.5, making this protein very unlikely to be responsible for laminin degradation during development.

      We perform gelatin gel zymography on MMP2/9, which shows increased MMP9 activity levels in the mutant cortex. This is similar to Western blot analysis (all lanes are loaded with the same amounts of cortical lysates).  The immunofluorescence staining, a different type, of analysis, was designed as a complementary approach.  Regarding RNA expression, please also note that MMP9 is a secreted protein and the protein expression pattern is expected to be different from that of RNA. We also have in situ data showing that, while MMP9 mRNA is indeed low, it is strongly expressed in macrophage-like cells most prominently in cortical blood vessels at E12-E13 (we will include these data in revision).  We suspect that these cells are microglial lineage cells populating the embryonic cortex at this stage (see, for example, Squarzoni et al., Cell Rep. 2014 Sep 11;8(5):1271-9. doi: 10.1016/j.celrep.2014.07.042.) and may be a major source of cortical MMP9.  As for functional contributions, we agree that we cannot rule roles played by other MMPs.  However, based on the ectopia suppression data, our results clearly indicate a key functional contribution by MMP9/13.

      For MMP9 activity, authors should show the whole membrane with a minimum of three control and three mutant individual samples and with the quantification.<br /> -The graphs should be improved, including individual values and titles of the Y axes.

      We will include these data in revision (the quantification of MMP9 activity is provided in Supplemental Figure 12d) and improve the graphs as suggested.

    1. Author response:

      We thank the reviewers for their feedback and will work to address it in our revision. We appreciate their recognition of the value of the dataset and will continue to strive to make it useful to the community.

    1. Author response:

      Puvlic Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      Dr. Santamaria's group previously utilized antigen-specific nanomedicines to induce immune tolerance in treating autoimmune diseases. The success of this therapeutic strategy has been linked to expanded regulatory mechanisms, particularly the role of T-regulatory type-1 (TR1) cells. However, the differentiation program of TR1 cells remained largely unclear. Previous work from the authors suggested that TR1 cells originate from T follicular helper (TFH) cells. In the current study, the authors aimed to investigate the epigenetic mechanisms underlying the transdifferentiation of TFH cells into IL-10-producing TR1 cells. Specifically, they sought to determine whether this process involves extensive chromatin remodeling or is driven by preexisting epigenetic modifications. Their goal was to understand the transcriptional and epigenetic changes facilitating this transition and to explore the potential therapeutic implications of manipulating this pathway. 

      The authors successfully demonstrated that the TFH-to-TR1 transdifferentiation process is driven by pre-existing epigenetic modifications rather than extensive new chromatin remodeling. The comprehensive transcriptional and epigenetic analyses provide robust evidence supporting their conclusions. 

      Strengths: 

      (1) The study employs a broad range of bulk and single-cell transcriptional and epigenetic tools, including RNA-seq, ATAC-seq, ChIP-seq, and DNA methylation analysis. This comprehensive approach provides a detailed examination of the epigenetic landscape during the TFH-to-TR1 transition. 

      (2) The use of high-throughput sequencing technologies and sophisticated bioinformatics analyses strengthens the foundation for the conclusions drawn. 

      (3) The data generated can serve as a valuable resource for the scientific community, offering insights into the epigenetic regulation of T-cell plasticity. 

      (4) The findings have significant implications for developing new therapeutic strategies for autoimmune diseases, making the research highly relevant and impactful. 

      We thank the reviewer for providing constructive feedback on the manuscript.

      Weaknesses: 

      (1) While the scope of this study lies in transcriptional and epigenetic analyses, the conclusions need to be validated by future functional analyses. 

      We fully agree with the reviewer’s suggestion. The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Functional validation is indeed the focus of our current studies, where we are carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.

      (2) This study successfully identified key transcription factors and epigenetic marks. How these factors mechanistically drive chromatin closure and gene expression changes during the TFH-to-TR1 transition requires further investigation. 

      Agreed. Please see our response to point #1 above.  

      (3) The study provides a snapshot of the epigenetic landscape. Future dynamic analysis may offer more insights into the progression and stability of the observed changes. 

      We have previously shown that the first event in the pMHCII-NP-induced TFH-TR1 transdifferentiation process involves proliferation of cognate TFH cells in the splenic germinal centers. This event is followed by immediate conversion of the proliferated TFH cells into transitional and terminally differentiated TR1 subsets. Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the TFH-TR1 cell pathway upon the termination of treatment, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (Sole et al., 2023a). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFHTR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein. 

      We will revise the manuscript accordingly, to address the three concerns raised by the reviewer, in the context of the ongoing studies mentioned above. 

      Reviewer #2 (Public Review): 

      Summary: 

      This study, based on their previous findings that TFH cells can be converted into TR1 cells, conducted a highly detailed and comprehensive epigenetic investigation to answer whether TR1 differentiation from TFH is driven by epigenetic changes. Their evidence indicated that the downregulation of TFH-related genes during the TFH to TR1 transition depends on chromatin closure, while the upregulation of TR1-related genes does not depend on epigenetic changes. 

      Strengths: 

      (1) A significant advantage of their approach lies in its detailed and comprehensive assessment of epigenetics. Their analysis of epigenetics covers chromatin open regions, histone modifications, DNA methylation, and using both single-cell and bulk techniques to validate their findings. As for their results, observations from different epigenetic perspectives mutually supported each other, lending greater credibility to their conclusions. This study effectively demonstrates that (1) the TFH-to-TR1 differentiation process is associated with massive closure of OCRs, and (2) the TR1-poised epigenome of TFH cells is a key enabler of this transdifferentiation process. Considering the extensive changes in epigenetic patterns involved in other CD4+ T lineage commitment processes, the similarity between TFH and TR1 in their epigenetics is intriguing. 

      (2) They performed correlation analysis to answer the association between "pMHC-NPinduced epigenetic change" and "gene expression change in TR1". Also, they have made their raw data publicly available, providing a comprehensive epigenomic database of pMHC-NPinduced TR1 cells. This will serve as a valuable reference for future research. 

      We thank the reviewer for his/her constructive feedback and suggestions for improvement of the manuscript.

      Weaknesses: 

      (1) A major limitation is that this study heavily relies on a premise from the previous studies performed by the same group on pMHC-NP-induced T-cell responses. This significantly limits the relevance of their conclusion to a broader perspective. Specifically, differential OCRs between Tet+ and naïve T cells were limited to only 821, as compared to 10,919 differential OCRs between KLH-TFH and naïve T cells (Figure 2A), indicating that the precursors and T cell clonotypes that responded to pMHC-NP were extremely limited. This limitation should be clearly discussed in the Discussion section. 

      We agree that this study focuses on a very specific, previously unrecognized pathway discovered in mice treated with pMHCII-NPs. Despite this apparent narrow perspective, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Furthermore, this pathway affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported here can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area. We will discuss the limitations and opportunities that this research provides more explicitly in a revised manuscript to provide a clearer context for the scope and applicability of our findings.

      We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLH-induced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (Sole et al., 2023a). However, we note that scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells). This will be clarified in a revised version of the manuscript.

      (2) This article uses peak calling to determine whether a region has histone modifications, claiming that the regions with histone modifications in TFH and TR1 are highly similar. However, they did not discuss the differences in histone modification intensities measured by ChIP-seq. For example, as shown in Figure 6C, IL10 H3K27ac modification in Tet+ cells showed significantly higher intensity than KLH-TFH, while in this article, it may be categorized as "possessing same histone modification region". This will strengthen their conclusions.

      We appreciate your suggestion to discuss differences in histone modification intensities as measured by ChIP-seq. However, we respectfully disagree with the reviewer’s interpretation of these data.

      Our study primarily focuses on the identification of epigenetic similarities and differences between pMHCII-NP-induced tetramer+ cells and KLH-induced TFH cells relative to naive T cells. The outcome of direct comparisons of histone deposition (ChIP-seq) between these cell types is summarized in the lower part of Figure 4B and detailed in Datasheet 5. Throughout this section, we report the number of differentially enriched regions, their overlap with OCRs shared between tetramer+ TFH and tetramer+ TR1 cells based on scATAC-seq data, and the associated genes. Clearly, most of the epigenetic modifications that TR1 cells inherit from TFH cells had already been acquired by TFH cells upon differentiation from naïve T cell precursors. 

      Regarding the specific point raised by the reviewer on differences in the intensity of the H3K27Ac peaks linked to Il10 in Figure 6C, we note that the genomic tracks shown are illustrative. However, thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for H3K27Ac deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLHinduced TFH cells. 

      We acknowledge that peak calling alone does not account for intensity variations of histone modifications. However, our analysis includes both qualitative and quantitative assessments to ensure robust conclusions. We will edit the relevant sections of the manuscript to clarify these points and better communicate our methodology and findings to the readers.

      (3) Last, the key findings of this study are clear and convincing, but some results and figures are unnecessary and redundant. Some results are largely a mere confirmation of the relationship between histone marks and chromatin status. I propose to reduce the number of figures and text that are largely confirmatory. Overall, I feel this paper is too long for its current contents. 

      We understand this reviewer’s concern about the potential redundancy of some results and figures. The goal of including these analyses is to provide a comprehensive understanding of the intricate relationships between epigenetic features and transcriptomic differences. We believe that a detailed examination of these relationships is crucial for several reasons: (i) the breadth of the data allows for a thorough exploration of the relationships between histone marks, chromatin accessibility and transcriptional differences. This comprehensive approach helps ensure that our conclusions are robust and well-supported by the data; (ii) some of the results that may appear confirmatory are, in fact, important for validating and reinforcing the consistency of our findings across different contexts. These details intend to provide a nuanced understanding of the interactions between epigenetic features and gene expression; and (iii) by presenting a detailed analysis, we aim to offer a solid foundation for future research in this area. The extensive datasets that are presented in this paper will serve as a valuable resource for others in the field who may seek to build upon our findings.

      That said, we will carefully review the manuscript to identify and streamline any elements that may be overly redundant. We will consider consolidating figures and refining the text to ensure that the paper remains concise and focused while retaining the depth of analysis that we believe is essential.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      (1) In the intro, it seems to me that the multiple-demand (MD) regions are the key in this study. However, I didn't see any results associated with the MD regions. Did I miss something?

      Thank you to the reviewer for pointing this out. After careful consideration, we agree with your point of view. According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. This suggests that hMT+ does have the potential to become the core of MD system. However, due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated through the frontal cortex”, it is not yet sufficient to prove that hMT+is the core node of the MD system, we have adjusted the explanatory logic of the article. Briefly, we emphasize the de-redundancy of hMT+ in visual-spatial intelligence and the improvement of information processing efficiency, while weaken the significance of hMT+ in MD systems.

      (2) How was the sample size determined? Is it sufficient?

      Thank you to reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has reasonable power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 datasets to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes an enough dataset.

      (3) In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank reviewer for pointing this out. There are several differences between us:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are describe in reviewer 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (4) Basically this study contains the data of SI, BDT, GABA in MT+ and V1, Glu in MT+ and V1-all 6 measurements. There should be 6x5/2 = 15 pairwise correlations. However, not all of these results are included in Figure 1 and supplementary 1-3. I understand that it is not necessary to include all figures. But I suggest reporting all values in one Table.

      We thank the reviewer for the good suggestion, we have made a correlation matrix to reporting all values in Figure Supplementary 9.

      (5) In Melnick (2013), the IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used the visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III?

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.

      (6) In the functional connectivity part, there is no explanation as to why only the left MT+ was set to the seed region. What is the problem with the right MT+?

      We thank the reviewer for pointing this out. The main reason is that our MRS ROI is the left hMT+, we would like to make different models’ ROI consistent to each other. Use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      (7) In Melnick (2013), the authors also reported the correlation between IQ and absolute duration thresholds of small and large stimuli. Please include these analyses as well.

      We thank the reviewer for the good advice. Containing such result do help researchers compare the result between Melnick and us. We have made such figures in the revised version (Figure 3f, g).

      Reviewer #2 (Public Review):

      Summary:

      Recent studies have identified specific regions within the occipito-temporal cortex as part of a broader fronto-parietal, domain-general, or "multiple-demand" (MD) network that mediates fluid intelligence (gF). According to the abstract, the authors aim to explore the mechanistic roles of these occipito-temporal regions by examining GABA/glutamate concentrations. However, the introduction presents a different rationale: investigating whether area MT+ specifically, could be a core component of the MD network.

      Strengths:

      The authors provide evidence that GABA concentrations in MT+ and its functional connectivity with frontal areas significantly correlate with visuo-spatial intelligence performance. Additionally, serial mediation analysis suggests that inhibitory mechanisms in MT+ contribute to individual differences in a specific subtest of the Wechsler Adult Intelligence Scale, which assesses visuo-spatial aspects of gF.

      Weaknesses:

      (1) While the findings are compelling and the analyses robust, the study's rationale and interpretations need strengthening. For instance, Assem et al. (2020) have previously defined the core and extended MD networks, identifying the occipito-temporal regions as TE1m and TE1p, which are located more rostrally than MT+. Area MT+ might overlap with brain regions identified previously in Fedorenko et al., 2013, however the authors attribute these activations to attentional enhancement of visual representations in the more difficult conditions of their tasks. For the aforementioned reasons, It is unclear why the authors chose MT+ as their focus. A stronger rationale for this selection is necessary and how it fits with the core/extended MD networks.

      We really appreciate reviewer’s opinions. The reason why we focus on hMT+ is following: According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with high correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. In addition, Fedorenko et al. 2013, the averaged MD activity region appears to overlap with hMT+. Based on these findings, we assume that hMT+ does have the potential to become the core of MD system.

      (2) Moreover, although the study links MT+ inhibitory mechanisms to a visuo-spatial component of gF, this evidence alone may not suffice to position MT+ as a new core of the MD network. The MD network's definition typically encompasses a range of cognitive domains, including working memory, mathematics, language, and relational reasoning. Therefore, the claim that MT+ represents a new core of MD needs to be supported by more comprehensive evidence.

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. Due to our results only delving into visuo-spatial intelligence, it is not yet sufficient to prove that hMT is the core node of the MD system. We will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to understand the role of GABA-ergic inhibition in the human MT+ region in predicting visuo-spatial intelligence through a combination of behavioral measures, fMRI (for functional connectivity measurement), and MRS (for GABA/glutamate concentration measurement). While this is a commendable goal, it becomes apparent that the authors lack fundamental understanding of vision, intelligence, or the relevant literature. As a result, the execution of the research is less coherent, dampening the enthusiasm of the review.

      Strengths:

      (1) Comprehensive Approach: The study adopts a multi-level approach, i.e., neurochemical analysis of GABA levels, functional connectivity, and behavioral measures to provide a holistic understanding of the relationship between GABA-ergic inhibition and visuo-spatial intelligence.

      (2) Sophisticated Techniques: The use of ultra-high field magnetic resonance spectroscopy (MRS) technology for measuring GABA and glutamate concentrations in the MT+ region is a recent development.

      Weaknesses:

      Study Design and Hypothesis

      (1) The central hypothesis of the manuscript posits that "3D visuo-spatial intelligence (the performance of BDT) might be predicted by the inhibitory and/or excitation mechanisms in MT+ and the integrative functions connecting MT+ with the frontal cortex." However, several issues arise:

      (1.1) The Suppression Index depicted in Figure 1a, labeled as the "behavior circle," appears irrelevant to the central hypothesis.

      We thank the reviewer for pointing this out. In our study, the inhibitory mechanisms in hMT+ are conceptualized through two models: the neurotransmitter model and the behavioral model. The Suppression Index is essential for elucidating the local inhibitory mechanisms within the behavioral model. However, we acknowledge that our initial presentation in the introduction may not have clearly articulated our hypothesis, potentially leading to misunderstandings. We have revised the introduction to better clarify these connections and ensure the relevance of the Suppression Index is comprehensively understood.

      (1.2) The construct of 3D visuo-spatial intelligence, operationalized as the performance in the Block Design task, is inconsistently treated as another behavioral task throughout the manuscript, leading to confusion.

      We thank the reviewer for pointing this out. We acknowledge that our manuscript may have inconsistently presented this construct across different sections, causing confusion. To address this, we ensured a consistent description of 3D visuo-spatial intelligence in both the introduction and the discussion sections. But we maintained ‘Block Design task score' within the results section to help readers clarify which subtest we use.

      (1.3) The schematics in Figure 1a and Figure 6 appear too high-level to be falsifiable. It is suggested that the authors formulate specific and testable hypotheses and preregister them before data collection.

      We thank the reviewer for pointing this out. We have revised the Figure 1a and made it less abstract and more logical. For Figure 6, the schematic represents our theoretical framework of how hMT+ contributes to 3D visuo-spatial intelligence, we believe the elements within this framework are grounded in related theories and supported by evidence discussed in our results and discussions section, making them specific and testable.

      (2) Central to the hypothesis and design of the manuscript is a misinterpretation of a prior study by Melnick et al. (2013). While the original study identified a strong correlation between WAIS (IQ) and the Suppression Index (SI), the current manuscript erroneously asserts a specific relationship between the block design test (from WAIS) and SI. It should be noted that in the original paper, WAIS comprises Similarities, Vocabulary, Block design, and Matrix reasoning tests in Study 1, while the complete WAIS is used in Study 2. Did the authors conduct other WAIS subtests other than the block design task?

      Thank you for pointing this out. Reviewer #1 also asked this question, we copy the answers in here “The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.”

      (3) Additionally, there are numerous misleading references and unsubstantiated claims throughout the manuscript. As an example of misleading reference, "the human MT ... a key region in the multiple representations of sensory flows (including optic, tactile, and auditory flows) (Bedny et al., 2010; Ricciardi et al., 2007); this ideally suits it to be a new MD core." The two references in this sentence are claims about plasticity in the congenitally blind with sensory deprivation from birth, which is not really relevant to the proposal that hMT+ is a new MD core in healthy volunteers.

      Thank you for pointing this out. We have carefully read the corresponding references and considered the corresponding theories and agree with these comments. Due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+ is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems. In addition, regarding the potential central role of hMT+ in the MD system, we agree with your view that research on hMT+ as a multisensory integration hub mainly focuses on developmental processes. Meanwhile, in adults, the MST region of hMT+ is considered a multisensory integration area for visual and vestibular inputs, which potentially supports the role of hMT+ in multitasking multisensory systems (Gu et al., J. Neurosci, 26(1), 73–85, 2006; Fetsch et al., Nat. Neurosci, 15, 146–154, 2012.). Further research could explore how other intelligence sub-ability such as working memory and language comprehension are facilitated by hMT+'s features.

      Another example of unsubstantiated claim: the rationale for selecting V1 as the control region is based on the assertion that "it mediates the 2D rather than 3D visual domain (Born & Bradley, 2005)". That's not the point made in the Born & Bradley (2005) paper on MT. It's crucial to note that V1 is where the initial binocular convergence occurs in cortex, i.e., inputs from both the right and left eyes to generate a perception of depth.

      Thank you for pointing this out. We acknowledge the inappropriate citation of "Born & Bradley, 2005," which focuses solely on the structure and function of the visual area MT. However, we believe that choosing hMT+ as the domain for 3D visual analysis and V1 as the control region is justified. Cumming and DeAngelis (Annu Rev Neurosci, 24:203–238.2001) state that binocular disparity provides the visual system with information about the three-dimensional layout of the environment, and the link between perception and neuronal activity is stronger in the extrastriate cortex (especially MT) than in the primary visual cortex. This supports our choice and emphasizes the relevance of hMT+ in our study. We have revised our reference in the revised version.

      Results & Discussion

      (1) The missing correlation between SI and BDT is crucial to the rest of the analysis. The authors should discuss whether they replicated the pattern of results from Melnick et al. (2013) despite using only one WAIS subtest.

      We thank for the reviewer’s suggestion. We have placed it in the main text (Figure 3e).

      (2) ROIs: can the authors clarify if the results are based on bilateral MT+/V1 or just those in the left hemisphere? Can the authors plot the MRS scan area in V1? I would be surprised if it's precise to V1 and doesn't spread to V2/3 (which is fine to report as early visual cortex).

      We thank for the reviewer’s suggestion. We have drawn the V1 ROI MRS scanning area (Figure supplement 1). Using the template, we checked the coverage of V1, V2, and V3. Although the MRS overlap regions extend to V2 (3%) and V3 (32%), the major coverage of the MRS scanning area is in V1, with 65% overlap across subjects.

      (3) Did the authors examine V1 FC with either the frontal regions and/or whole brain, as a control analysis? If not, can the author justify why V1 serves as the control region only in the MRS but not in FC (Figure 4) or the mediation analysis (Figure 5)? That seems a little odd given that control analyses are needed to establish the specificity of the claim to MT+

      We thank for the reviewer’s suggestion. We have done the V1 FC-behavior connection as control analysis (Figure supplement 7). Only positive correlations in the frontal area were detected, suggesting that in the 3D visuo-spatial intelligence task, V1 plays a role in feedforward information processing. However, hMT+, which showed specific negative correlations in the frontal, is involved in the inhibition mechanism. These results further emphasize the de-redundancy function of hMT+ in 3D visuo-spatial intelligence.

      Regarding the mediation analysis, since GABA/Glu concentration in V1 has no correlation with BDT score, it is not sufficient to apply mediation analysis.

      (4) It is not clear how to interpret the similarity or difference between panels a and b in Figure 4.

      We thank the reviewer for pointing this out. We have further interpreted the difference between a and b in the revised version. Panels a represents BDT score correlated hMT+-region FC, which is obviously involved in frontal cortex. While panels b represents SI correlated hMT+-region FC, which shows relatively less regions. The overlap region is what we are interested in and explain how local inhibitory mechanisms works in the 3D visuo-spatial intelligence. In addition, we have revised Figure 4 and point out the overlap region.

      (5) SI is not relevant to the authors‘ priori hypothesis, but is included in several mediation analyses. Can the authors do model comparisons between the ones in Figure 5c, d, and Figure S6? In other words, is SI necessary in the mediation model? There seem discrepancies between the necessity of SI in Figures 5c/S6 vs. Figure 5d.

      We thank the reviewer for highlighting this point. The relationship between the Suppression Index (SI) and our a priori hypotheses is elaborated in the response to reviewer 3, section (1). SI plays a crucial role in explicating how local inhibitory mechanisms, on the psychological level, function within the context of the 3D visuo-spatial task. Additionally, Figure 5c illustrates the interaction between the frontal cortex and hMT+, showing how the effects from the frontal cortex (BA46) on the Block Design Task are fully mediated by SI. This further underscores the significance of SI in our model.

      (6) The sudden appearance of "efficient information" in Figure 6, referring to the neural efficiency hypothesis, raises concerns. Efficient visual information processing occurs throughout the visual cortex, starting from V1. Thus, it appears somewhat selective to apply the neural efficiency hypothesis to MT+ in this context.

      We thank the reviewer for highlighting this point. There is no doubt that V1 involved in efficient visual information processing. However, in our result, the V1 GABA has no significant correlation between BDT score, suggesting that the V1 efficient processing might not sufficiently account for the individual differences in 3D visuo-spatial intelligence. Additionally, we will clarify our use of the neural efficiency hypothesis by incorporating it into the introduction of our paper to better frame our argument.

      Transparency Issues:

      (1) Don't think it's acceptable to make the claim that "All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary information". It is the results or visualizations of data analysis, rather than the raw data themselves, that are presented in the paper/supp info.

      We thank the reviewer for pointing this out. We realized that such expression would lead to confusion. We have deleted this expression.

      (2) No GitHub link has been provided in the manuscript to access the source data, which limits the reproducibility and transparency of the study.

      We thank the reviewer for pointing this out. We have attached the GitHub link in the revised version.

      Minor:

      "Locates" should be replaced with "located" throughout the paper. For example: "To investigate this issue, this study selects the human MT complex (hMT+), a region located at the occipito-temporal border, which represents multiple sensory flows, as the target brain area."

      We thank the reviewer for pointing this out. We have revised it.

      Use "hMT+" instead of "MT+" to be consistent with the term in the literature.

      We thank the reviewer for pointing this out. We agree to use hMT+ in the literature.

      "Green circle" in Figure 1 should be corrected to match its actual color.

      We thank the reviewer for pointing this out. We have revised it.

      The abbreviation for the Wechsler Adult Intelligence Scale should be "WAIS," not "WASI."

      We thank the reviewer for pointing this out. We have revised it.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The figures and tables should be substantially improved.

      We thank the reviewer for pointing this out. We have improved some of the figures’ quality.

      (2) Please explain the sample size, and the difference between Schallmo eLife 2018, and Melnick, 2013.

      We thank the reviewer for pointing this out. These questions are answered in the public review. We copy the answer in the public review.

      (2.1)  How was the sample size determined? Is it sufficient??

      Thank you to the reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has adequate power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 subjects to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes an enough dataset.

      (2.2)  In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank you to the reviewer for pointing this out. There are several differences between the two studies, ours and theirs:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are described in review 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (3) Table 1 and Table Supplementary 1-3 contain many correlation results. But what are the main points of these values? Which values do the authors want to highlight? Why are only p-values shown with significance symbols in Table Supplementary 2?

      (3.1) what are the main points of these values?

      Thank you to the reviewer for pointing this out. These correlations represent the relationship between behavior task (SI/BDT) and resting-state functional connectivity. It indicates that left hMT+ is involved in the efficient information integration network when it comes to the BDT task. In addition, left hMT+’s surround suppression is involved in several hMT+ - frontal connectivity. Furthermore, the overlapping regions between two tasks indicate a shared underlying mechanism.

      (3.2) Which values do the authors want to highlight?

      Table 1 and Table Supplementary 1-3 present the preliminary analysis results for Table 2 and Table Supplementary 4-6. So, we generally report all value. Conversely, in the Table 2 and Table Supplementary 4-6, we highlight (bold font) indicating the significant correlations survived from multi correlation correction.

      (3.3) Why are only p-values shown with significance symbols in Table Supplementary 2?

      Thank you for pointing this out, it is a mistake. We have revised it and delete the significance symbols.

      (4) Line 27, it is unclear to me what is "the canonical theory".

      We thank the reviewer for pointing this out. We have revised “the canonical theory" to “the prevailing opinion”.

      (5) Throughout the paper, the authors use "MT+", I would suggest using "hMT+" to indicate the human MT complex, and to be consistent with the human fMRI literature.

      We thank the reviewer for pointing this out. We have revised them and used "hMT+" to be consistent with the human fMRI literature.

      (6) At the beginning of the results section, I suggest including the total number of subjects. It is confusing what "31/36 in MT+, and 28/36 in V1" means.

      We thank the reviewer for pointing this out. We have included the total number of subjects in the beginning of result section.

      (7) Line 138, "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area". This sentence is strange because it is a well-established finding in numerous human fMRI papers. I think the authors should be more specific about what this finding implies.

      We thank the reviewer for pointing this out. We have deleted the inappropriate sentence "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area".

      (8) There are no unit labels for all x- and y-axies in Figure 1. I only see the unit for Conc is mmol per kg wet weight.

      We thank the reviewer for pointing this out. Figure 1 is a schematic and workflow chart, so labels for x- and y-axes are not needed. I believe this confusion might pertain to Figure 3. In Figures 3a and 3b, the MRS spectrum does not have a standard y-axis unit as it varies based on the individual physical conditions of the scanner; it is widely accepted that no y-axis unit is used. While the x-axis unit is ppm, which indicate the chemical shift of different metabolites. In Figure 3c, the BDT represents IQ scores, which do not have a standard unit. Similarly, in Figures 3d and 3e, the Suppression Index does not have a standard unit.

      (9) Although the correlations are not significant in Figure Supplement 2&3, please also include the correlation line, 95% confidence interval, and report the r values and p values (i.e., similar format as in Figure 1C).

      We thank the reviewer for pointing this out. We have revised them.

      (10) There is no need to separate different correlation figures into Figure Supplementary 1-4. They can be combined into the same figure.

      We thank the reviewer for the suggestion. However, each correlation figure in the supplementary figures has its own specific topic and conclusion. The correlation figures in Supplementary Figure 1 indicate that GABA in V1 does not show any correlation with BDT and SI, illustrating that inhibition in V1 is unrelated to both 3D visuo-spatial intelligence and motion suppression processing. The correlations in Supplementary Figure 2 indicate that the excitation mechanism, represented by Glutamate concentration, does not contribute to 3D visuo-spatial intelligence in either hMT+ or V1. Supplementary Figure 3 validates our MRS measurements. Supplementary Figure 4 addresses potential concerns regarding the impact of outliers on correlation significance. Even after excluding two “outliers” from Figures 3d and 3e, the correlation results remain stable.

      (11) Line 213, as far as I know, the study (Melnick et al., 2013) is a psychophysical study and did not provide evidence that the spatial suppression effect is associated with MT+.

      We thank the reviewer for pointing this out. It was a mistake to use this reference, and we have revised it accordingly.

      (12) At the beginning of the results, I suggest providing more details about the motion discrimination tasks and the measurement of the BDT.

      We thank the reviewer for pointing this out. We have included some brief description of task at the beginning of the result section.

      (13) Please include the absolute duration thresholds of the small and large sizes of all subjects in Figure 1.

      We thank the reviewer for the suggestion. We have included these results in Figure 3.

      (14) Figure 5 is too small. The items in plot a and b can be barely visible.

      We thank the reviewer for pointing this out. We increase the size and resolution of Figure 5.

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for improving the writing and presentation.

      I highly recommend editing the manuscript for readability and the use of the English language. I had significant difficulties following the rationale of the research due to issues with the way language was used.

      We thank the reviewer for pointing this out. We apologize for any shortcomings in our initial presentation. We have invited a native English speaker to revise our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head-restrained mice running down a virtual linear path. Mice were trained to collect water rewards at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in the ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90 s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.  

      Strengths:  

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis of the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.  

      Weaknesses:  

      Aspects of the methodology, data analysis, and interpretation diminish the overall significance of the findings, as detailed below.  

      The LC axonal recordings are well-powered, but the DA axonal recordings are severely underpowered, with recordings taken from a mere 7 axons (compared to 87 LC axons).

      Additionally, 2 different calcium indicators with differential kinetics and sensitivity to calcium changes (GCaMP6S and GCaMP7b) were used (n=3, n=4 respectively) and the data pooled. This makes it very challenging to draw any valid conclusions from the data, particularly in the novelty experiment. The surprising lack of novelty-induced DA axon activity may be a false negative. Indeed, at least 1 axon (axon 2) appears to be showing a novelty-induced rise in activity in Figure 3C. Changes in activity in 4/7 axons are also referred to as a 'majority' occurrence in the manuscript, which again is not an accurate representation of the observed data.  

      We appreciate the reviewer's detailed feedback regarding the analysis of VTA axons in our dataset. The relatively low sample size for VTA axons is due to their sparsity in the dCA1 region of the hippocampus and the inherent difficulty in recording from these axons. VTA axons are challenging to capture due to their low baseline fluorescence and long-range axon segments, resulting in a typical yield of only a single axon per field of view (FOV) per animal. In contrast, LC axons are more abundant in dCA1.

      To address the disparity in sample sizes between LC and VTA axons, we down-sampled the LC axons to match the number of VTA axons, repeating this process 1000 times to create a distribution. However, we acknowledge the reviewer's concern that the relatively low sample size for VTA axons might result in insufficient sampling of this population. Increasing the baseline expression of GCaMP to record from VTA axons requires several months, limiting our ability to quickly expand the sample size.

      In response to the reviewer's comments, we have added recordings from 2 additional VTA axons, increasing the sample size from 7 to 9. We re-analyzed all data from the familiar environment with n=9 VTA axons, comparing them to down-sampled LC axons as previously described. However, the additional axons were not recorded in the novel environment. We agree with the reviewer that the lack of novelty-induced DA axon activity may be a false negative. To address this, we have revised the description of our results to include the following sentence:

      “However, 1 VTA ROI showed an increase in activity immediately following exposure to novelty, indicating heterogeneity across VTA axons in CA1, and the lack of a novelty signal on average may be due to a small sample size.”

      Regarding the use of two different GCaMP constructs, we understand the reviewer's concern. We used GCaMP6s and GCaMP7b variants to determine if one would improve the success rate of recording from VTA axons. Given the long duration of these experiments and the low yield, we pooled the data from both GCaMP variants to increase statistical power. However, we recognize the importance of verifying that there are no differences in the signals recorded with these variants.

      With the addition of 2 VTA DA axons expressing GCaMP6s, we now have n=5 GCaMP6s and n=4 GCaMP7b VTA DA axons. This allowed us to compare the activity of the two sensors in the familiar environment. As shown in new Supplementary Figure 2, both sets of axons responded similarly to the variables measured: position in VR, time to motion onset, and animal velocity (although the GCaMP6s expressing axons showed stronger correlations). Since all LC axons recorded expressed GCaMP6s, we also specifically compared VTA GCaMP6s axons to LC GCaMP6s axons (Supp Fig. 3). Our conclusions remained consistent when comparing this subset of VTA axons to LC axons.

      Overall, our paper now includes comparisons of combined VTA axons (n=9) and separately the GCaMP6s-expressing VTA axons (n=5) with LC axons. Both datasets support our initial conclusions that VTA axons signal proximity to reward, while LC axons encode velocity and motion initiation in familiar environments.

      The authors conducted analysis on recording data exclusively from periods of running in the novelty experiment to isolate the effects of novelty from novelty-induced changes in behavior. However, if the goal is to distinguish between changes in locus coeruleus (LC) axon activity induced by novelty and those induced by motion, analyzing LC axon activity during periods of immobility would enhance the robustness of the results.  

      We appreciate the reviewer's insightful suggestion to analyze LC axon activity during periods of immobility to distinguish between changes induced by novelty and those induced by motion. This additional analysis would indeed strengthen our conclusions regarding the LC novelty signal.

      In response to this suggestion, we performed the same analysis as before, but focused on periods of immobility. Our findings indicate that following exposure to novelty, there was a significant increase in LC activity specifically during immobility. This supports the idea that LC axons produce a novelty signal that is independent of novelty-induced behavioral changes. The results of this analysis are now presented in new Supplementary Figure 5b

      The authors attribute the ramping activity of the DA axons to the encoding of the animals' position relative to reward. However, given the extensive data implicating the dorsal CA1 in timing, and the remarkable periodicity of the behavior, the fact that DA axons could be signalling temporal information should be considered.  

      This is an insightful comment regarding the potential role of VTA DA axons in signaling temporal information. We agree that VTA DA axons could indeed be encoding temporal information, as previous work from our lab has shown that these axons exhibit ramping activity when averaged by time to reward (Krishnan et al., 2022).

      To address this, we have now examined DA axon activity relative to time to reward, as shown in new Supplementary Figure 4. Our analysis confirms that these axons ramp up in activity relative to time to reward. Given the periodicity of our mice's behavior in these experiments, as the reviewer correctly points out, we are unable to distinguish between spatial proximity to reward and time to reward. We have added a sentence to our paper highlighting this limitation and stating that further experiments are necessary to differentiate these two variables.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      The authors should explain and justify the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments.  

      We appreciate the reviewer's insightful comment regarding the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments. The choice of a 3m track for LC axon recordings was made to align with a previous experiment from our lab (Dong et al., 2021), in which mice were exposed to a novel 3m track while CA1 pyramidal cell populations were recorded. In that study, we detailed the time course of place field formation within the novel track. Our current hypothesis is that LC axons signal novelty, and we aimed to investigate whether the time course of LC axon activity aligns with the time course of place field formation. This hypothesis, and the potential role of LC axons in facilitating plasticity for new place field formation, is further discussed in the Discussion section of our paper.

      For the VTA axon recordings, we utilized a 2m track, consistent with another recent study from our lab (Krishnan et al., 2022), where reward expectation was manipulated, and CA1 pyramidal cell populations were recorded. By matching the track length to this prior study, we aimed to explore how VTA dopaminergic inputs to CA1 might influence CA1 population dynamics along the track under conditions of varying reward expectations.

      We acknowledge that using different track lengths for LC and VTA recordings introduces a variable that could potentially confound direct comparisons. To address this, we normalized the track lengths for our LC versus VTA comparison analysis. This normalization allowed us to directly compare patterns of activity across the two types of axons by adjusting the data to a common scale, thereby ensuring that any observed differences or similarities are attributable to the intrinsic properties of the axons rather than differences in track lengths. By doing so, we could assess relative changes in activity levels at matched spatial bins.

      Although the experiences of the animals on the different track lengths are not identical, our observations suggest that LC and VTA axon signals are not majorly influenced by variations in track length. LC axons are associated with velocity and a pre-motion initiation signal, neither of which are affected by track length. VTA axons, which also correlate with velocity, can be compared to LC axon velocity signals because mice reach maximal velocity very quickly a long the track, well before the end of the 2m track. The range of velocities are therefore capture on both track lengths. While VTA axons exhibit ramping activity as they approach the reward zone—a signal potentially modulated by track length—LC axons do not show such ramping to reward signals. Thus, a comparison across different track lengths is justified for this aspect of our analysis.

      To further enhance the rigor of our comparisons between axon dynamics recorded on 2m and 3m tracks, we conducted an additional analysis plotting axon activity by time to reward and actual (un-normalized) distance from reward (Supplementary Figure 4). This analysis revealed very similar signals between the two sets of axons, supporting our initial conclusions.

      We thank the reviewer for raising this important point and hope that our detailed explanation and additional analysis address their concern.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Dong, C., Madar, A. D. & Sheffield, M.E. Distinct place cell dynamics in CA1 and CA3 encode experience in new environments. Nat Commun 12, 2977 (2021).

      Reviewer #2 (Public Review):  

      Summary:  

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.  

      The main findings were as follows:  

      - In a familiar environment, the activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.  

      - VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.  

      - In contrast, the activity of LC axons ramped up before the initiation of movement on the Styrofoam wheel.  

      - In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.  

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral, and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward, and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.  

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.  

      Strengths:  

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that the activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.  

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.  

      Weaknesses:  

      (1) The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.  

      (2) Some aspects of the methodology would benefit from clarification.  

      First, to help others to better scrutinize, evaluate, and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data.

      We thank the reviewer for helping us formalize the scientific rigor of our study. There are ten ARRIVE Guidelines and we have addressed most of them in our study already. However, there is an opportunity to add detail. We have listed below all ten points and how we have addressed each one (and point out any new additions):

      (1) Experimental design - we go into great depth explaining the experimental set-up, how we used the autofluorescent blebs as imaging controls, how we controlled for different sample sizes between the two populations, and the statistical tests used for comparisons. We also carefully accounted for animal behavior when quantifying and describing axon dynamics both in the familiar and novel environments.

      (2) Sample size - we state both the number of ROIs and mice for each analysis. We have now also added the number of mice we observed specific types of activity in. 

      (3) Inclusion/exclusion criteria - The following has now been added to the Methods section: Out of the 36 NET-Cre mice injected, 15 were never recorded from for either failing to reach behavioral criteria, or a lack of visible expression in axons. Out of the 54 DAT-Cre mice injected, imaging was never conducted in 36 of them for lack of expression or failing to reach behavioral criteria. Out of the remaining 21 NET-CRE, 5 were excluded for heat bubbles, z-drift, or bleaching, while 10 DAT-Cre were excluded for the same reasons. This was determined by visually assessing imaging sessions, followed by using the registration metrics output by suite2p. This registration metric conducted a PCA on the motion-corrected ROIs and plotted the first PC. If the PC drifted largely, to the point where no activity was apparent, the video was excluded from analysis. 

      (4) Randomization - Already included in the paper is a description of random downsampling of LC axons to make statistical comparisons with VTA axons. LC axons were selected pseudo-randomly (only one axon per imaging session) to match VTA sampling statistics. This randomization was repeated 1000 times and comparisons were made against this random distribution. 

      (5) Blinding-masking - no blinding/masking was conducted as no treatments were given that would require this. We will include this statement in the next version. 

      (6) Outcomes - We defined all outcomes measured, such as those related to animal behavior and axon signaling. 

      (7) Statistical methods - None of the reviewers had any issues regarding our description of statistical methods, which we described in great detail in this version of the paper. 

      (8) Experimental animals - We have now described that DAT- Cre mice were obtained through JAX labs, and NET-Cre mice were obtained from the Tonegawa lab (Wagatsuma et al. 2017). This was absent in the initial version of the paper.

      (9) Experimental procedure - Already listed in great detail in Methods section.

      (10) Results - Rigorously described in detail for behaviors and related axon dynamics.

      Wagatsuma, Akiko, Teruhiro Okuyama, Chen Sun, Lillian M. Smith, Kuniya Abe, and Susumu Tonegawa. “Locus Coeruleus Input to Hippocampal CA3 Drives Single-Trial Learning of a Novel Context.” Proceedings of the National Academy of Sciences 115, no. 2 (January 9, 2018): E310–16. https://doi.org/10.1073/pnas.1714082115.

      Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?  

      We thank the reviewer for pointing this out and giving us a chance to address it directly. A detailed response to this is written above for a similar comment from reviewer 1.

      Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Figure 3a, but as <0.2 cm/s for the imaging data analysis in Figure 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.  

      This is a typo leftover from before we converted velocity from rotational units of the treadmill to cm/s. This has now been corrected.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Figure 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Figure 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Figure 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the noveltyinduced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.  

      We appreciate the reviewer's insightful comment regarding the potential impact of decreased velocity on novelty responses in LC and VTA axons. The decreased velocity in the novel environment could lead to a diminished novelty response in LC axons and could mask a subtle novelty signal in VTA axons. We have now included the following points in our discussion:

      “In addition, as noted above, on average we did observe a velocity associated signal in VTA axons. When mice were exposed to the novel environment their velocity initially decreased. This would be expected to reduce the average signal across the VTA axon population relative to the higher velocity in the familiar environment. It is possible that this decrease could somewhat mask a subtle novelty induced signal in VTA axons. Therefore, additional experiments should be conducted to investigate the heterogeneity of these axons and their activity under different experimental conditions during tightly controlled behavior.”

      “As discussed above, the slowing down of animal behavior in the novel environment could have decreased LC axon activity and reduced the magnitude of the novelty signal we detected during running. The novelty signal we report here may therefore be an under estimate of it's magnitude under matched behavioral settings.”

      However, it is important to note that although VTA axons, on average, showed activity modulated by velocity in a familiar rewarded environment, this relationship was largely due to the activity of two VTA axons that were strongly modulated by velocity, indicating heterogeneity within the VTA axon population in dCA1. We have highlighted this point in the discussion. We also discuss that:

      “It is possible that some VTA DA inputs to dCA1 respond to novel environments, and the small number of axons recorded here are not representative of the whole population.”

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.  

      Mice receive their water reward through a water spout that is immobile and positioned directly in front of their mouth. Water delivery is triggered by a solenoid when the mice reach the end of the virtual track. Therefore, because the water spout is immobile and the water reward is not delivered until they reach the end of the track, there is nothing for the mice to detect during their run. We have added clarifications about the water spout to the Methods and Results sections, along with appropriate discussion points.

      Additionally, we note that the ramping activity of VTA axons is still present on the initial laps with no reward (Krishnan et al., 2022), indicating that this activity is not directly related to the presence or absence of water but is instead associated with the animal’s reward expectation.

      We thank the reviewer for raising this point and hope that these clarifications address their concern.

      Reviewer #3 (Public Review):  

      Summary:  

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine the activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during the approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength of their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of learning and memory. The conclusions of this manuscript are mostly well supported by the data, but some additional analysis and/or experiments may be required to fully support the author's conclusions.  

      Weaknesses:  

      (1) During teleportation between familiar to novel environments the authors report a decrease in the freezing ratio when combining the mice in the two experimental groups (Figure 3aiii). A major conclusion from the manuscript is the difference in VTA and LC activity following environment change, given VTA and LC activity were recorded in separate groups of mice, did the authors observe a similar significant reduction in freezing ratio when analyzing the behavior in LC and VTA groups separately?  

      In response to the comment regarding the freezing ratios during teleportation between familiar and novel environments, we have analyzed the freezing ratios and lap velocities of DAT-Cre and NET-Cre mice separately (Fig. 3Aiii). Our analysis shows that the mean lap velocities of both groups overlap in the familiar environment and significantly decrease on the first lap of the novel environment (Fig. 3iii, top). For subsequent laps, the velocities in both groups are not statistically significantly different from the familiar environment lap velocities.

      Freezing ratios also show a statistically significant decrease on the first lap of the novel environment compared to the familiar environment in both groups (Fig. 3iii, bottom). In the NETCRE mice, the freezing ratios remain statistically lower in subsequent laps, while in the DATCRE mice, the following laps show a similar trend but without statistical significance. This lack of statistical significance in the DAT-CRE mice is likely due to their already lower freezing ratios in the familiar environment. Overall, the data demonstrate similar behavioral responses in the two groups of mice during the switch from the familiar to the novel environment.

      (2) The authors satisfactorily apply control analyses to account for the unequal axon numbers recorded in the LC and VTA groups (e.g. Figure 1). However, given the heterogeneity of responses observed in Figures 3c, 4b and the relatively low number of VTA axons recorded (compared to LC), there are some possible limitations to the author's conclusions. A conclusion that LC-CA1 axons, as a general principle, heighten their activity during novel environment presentation, would require this activity profile to be observed in some of the axons recorded in most all LC-CA1 mice.

      We agree with the reviewer’s point. To address this issue, when downsampling LC axons to compare to VTA axons, we matched the sampling statistics of the VTA axons/mice by only selecting one LC axon from each mouse to match the VTA dataset.

      Additionally, we have now included the number of recording sessions and the number of mice in which we observed each type of activity. This information has been added to further clarify and support our conclusions.

      Additionally, if the general conclusion is that VTA-CA1 axons ramp activity during the approach to reward, it would be expected that this activity profile was recorded in the axons of most all VTA-CA1 mice. Can the authors include an analysis to demonstrate that each LC-CA1 mouse contained axons that were activated during novel environments and that each VTA-CA1 mouse contained axons that ramped during the approach to reward?  

      As above, we have now added the number of mice that had each activity type we report in the paper here.  

      (3) A primary claim is that LC axons projecting to CA1 become activated during novel VR environment presentation. However, the experimental design did not control for the presentation of a familiar environment. As I understand, the presentation order of environments was always familiar, then novel. For this reason, it is unknown whether LC axons are responding to novel environments or environmental change. Did the authors re-present the familiar environment after the novel environment while recording LC-CA1 activity?  

      While we did not vary the presentation order of familiar and novel environments, we recorded the activity of LC axons in some mice when exposed to a dark environment (no VR cues) prior to exposure to the familiar environment. Our analysis of this data demonstrates that LC axons are also active following abrupt exposure to the familiar environment.

      We have added a new figure showing this response (Supplementary Figure 5A) and expanded on our original discussion point that LC axon activity generally correlates with arousal, as this result also supports that interpretation.

      We thank the reviewer for highlighting this important consideration. It certainly helps with the interpretation regarding what LC axons generally encode.  

      >Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      In addition to what has been described in the public review, I have the following recommendations:  

      The sample size of DA axon recordings should be increased with the use of a single GCaMP for valid conclusions to be made about the lack of novelty-inducted activity in these axons.  

      We have increased the n of VTA GCaMP6s axons in the familiar environment by including two axons that were recorded in the familiar rewarded condition. We have also conducted an analysis comparing GCaMPs versus GCaMP7b, which is discussed in detail above.

      Regarding the concerns about valid conclusions of novelty-induced activity in VTA axons, we have added a comment in the discussion to tone down our conclusions regarding the lack of a novelty signal in the VTA axons. This valid concern is discussed in detail above.  

      The title is currently very generic, and non-informative. I recommend the use of more specific language in describing the type of behavior under investigation. It is not clear to the reviewer why 'learning' is included here.  

      Original title: “Distinct catecholaminergic pathways projecting to hippocampal CA1 transmit contrasting signals during behavior and learning”

      To make it more specific to the experiments conducted here, we have changed the title to this:

      New title: “Distinct catecholaminergic pathways projecting to hippocampal CA1 transmit contrasting signals during navigation in familiar and novel environments”

      Error noted in Figure 4C legend - remove reference to VTA ROIs.  

      The reference to VTA ROIs has been removed from the figure legend

      Reviewer #2 (Recommendations For The Authors):  

      (1) The concluding sentence of the Abstract could be more specific: which distinct types of information are reflected/'signaled'/'encoded' by LC and VTA inputs to dorsal CA1?  

      The abstract has been adjusted accordingly. The new sentence is more specific: “These inputs encode unique information, with reward information in VTA inputs and novelty and kinematic information in LC inputs, likely contributing to differential modulation of hippocampal activity during behavior and learning.”

      (2) Line 46/47: The study by Mamad et al. (2017) did not quite show that VTA dopamine input to dorsal CA1 'drives place preference'. To my understanding, the study showed that suppression of VTA dopamine signaling in a specific place caused avoidance of this place and that VTA dopamine signaling modulated hippocampal place-related firing. So, please consider rephrasing.  

      Corrected, thanks for pointing this out.

      (3) Legend to Figure 3AIII: 'Each lap was compared to the first lap in F . . .' Could you clarify if 'F' refers to the 'familiar environment?  

      Figure legend has been changed accordingly

      (4) Line 176: '36 LC neurons' - should this not be '36 imaged axon terminals in dorsal CA1' or something along these lines?  

      This reference has been changed to “LC axon ROIs”

      (5) Line 353: Why was water restriction started before the hippocampal window implant, if behavioral training to run for water reward only started after the implant? Please clarify.

      A sentence was added to the methods to explain that this was done to reduce bleeding and swelling during the hippocampal window implantation.  

      (6) Line 377: '. . . which took 10-14 days (although some mice never reached this threshold).' How many mice did not reach the criterion within 14 days? I think it is not accurate to say the mice 'never' reached the threshold, as they were only tested for a limited period of time.  

      We have added details of how many mice were excluded from each group and the reason why they were excluded.

      (7) Exclusion criteria for imaging data: The authors state (from line 402): 'Imaging sessions with large amounts of drift or bleaching were excluded from analysis (8 sessions for NET mice, 6 sessions for LC Mice).' What exactly were the quantitative exclusion criteria? Were these defined before the onset of the study or throughout the study?  

      Imaging sessions were first qualitatively assessed by looking for disappearance or movement of structures in the Z-plane throughout the imaging FOV. Additionally, following motion correction in suite2p, we used the registration metrics, which plots the first Principle Component of the motion corrected images, to assess for drift, bleaching, or heat bubbles. If this variable increased or decreased greatly throughout a session, to the point where any apparent activity was not visible in the first PC, the dataset was excluded. We have added these exclusion criteria to the methods section.

      Reviewer #3 (Recommendations For The Authors):  

      Please provide a justification or rationale for having two different criteria for immobility (< 5cm/sec) and freezing (<0.2 cm/sec). If VTA and LC axon activities are different between these two velocities, please provide some commentary on this difference.  

      This is a typo leftover from before we converted velocity from rotational units to cm/s.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewing editor’s list of items remaining to be addressed followed by our responses/actions:

      (1) The order and organization of supplemental figures and tables is almost impossible to navigate. Please put them in order. 

      All the sections from the previous Supplementary files have been divided into individual Supplementary files so that each can be referenced without confusion from the text. All of the references in the body of the text and the author responses have been updated to reflect this change.

      (2) The question of sample sizes was partially addressed, with authors stating that cell culture work in iPSCs and PGCLCs was done in replicates of 3. Sertoli and granulosa cells were generated from pooled preps - how many individuals, were they littermates? 

      Sertoli and granulosa primary cultures were generated from littermates and each prep used 5 animals (males for Sertoli cells and females for granulosa cells). These changes have been added to the body of the text on pages 39 and 40.

      (3) Authors need to discuss the limitations of doing work in triplicates. Their PCA (Supplement Figure 9) reveals that in several cases samples from the same treatment were not discriminated by PC1 and/or PC2. This is especially true in e and f, the variance of which was explained by PC1 for cell type, but for which treatments showed poor discrimination by PC2. Some discussion of the limitations of sample size should be provided.

      Additional text has been added to what is now Supplementary file 15 to acknowledge this limitation imposed by the limited number of replicates (three) and the ability to resolve the differences in treatments by PCA in subplots e and f. However, we also note that the differences were sufficient to identify significant DMCs/DMRs/DEGs.

      Reviwer 2 also noted a potential weakness that “exposures are more complicated in a whole organism than in an isolated cell line.”

      We note that in our revised manuscript we included wording noting that despite the advantages of using an in vitro approach to deduce underlying molecular mechanisms, results of such in vitro studies “ultimately warrant validation of results discerned from studies of in vitro models to ensure they also reflect functions ongoing in the more complex and heterogeneous environment of the intact animal in vivo.” Thus we have endeavored to acknowledge the reviewer’s point.

      Reviewer #1 (Public Review): 

      Critiques/Comments: 

      (1) A problem with in vitro work is that homogeneous cell lines/cultures are, by nature, absent from the rest of the microenvironment. The authors need to discuss this. 

      [Addressed on pages: 24-25] – We have added two sentences to the second paragraph of the Discussion section in which we now acknowledge this concern, but also point out that in vitro models of this sort also provide an experimental advantage in that they facilitate a deconvolution of the extensive complexity resident within the intact animal. Nevertheless, we acknowledge that this deconvolution requires ultimate validation of findings obtained within an in vitro model system to ensure they accurately recapitulate functions that occur in the intact animal in vivo.

      In response to Reviewer 2’s stated weakness of our study that “The weakness includes the fact that exposures are more complicated in a whole organism than in an isolated cell line,” please note that this added text includes the statement that despite the advantages of using an in vitro approach to deduce underlying molecular mechanisms, results of such in vitro studies “ultimately warrant validation of results discerned from studies of in vitro models to ensure they also reflect functions ongoing in the more complex and heterogeneous environment of the intact animal in vivo.” Thus we have endeavored to acknowledge the reviewer’s point.

      (2) What are n's/replicates for each study? Were the same or different samples used to generate the data for RNA sequencing, methylation beadchip analysis, and EM-seq? This clarification is important because if the same cultures were used, this would allow comparisons and correlations within samples.  

      Addressed on pages: 39-45 and in new Supplementary file 15 – Additional text has been added in the Methods section to indicate that all samples involving cell culture models which include iPSCs and PGCLCs came from a single XY iPS cell line aliquoted into replicates and all primary cultures which included Sertoli and granulosa cells were generated from pooled tissue preps from mice and then aliquoted into replicates. Finally, all experiments in the study were performed on three replicates. Because this experimental design did indeed allow for comparisons among samples, we have added a new Supplementary file 15

      which displays PCA plots showing clustering among control and treatment datasets, respectively, as well as distinctions between each cluster representing each experimental condition.

      (3) In Figure 1, it is interesting that the 50 uM BPS dose mainly resulted in hypermethylation whereas 100 uM appears to be mainly hypomethylation. (This is based on the subjective appearance of graphs). The authors should discuss and/or present these data more quantitatively. For example, what percentage of changes were hypo/hypermethylation for each treatment? How many DMRs did each dose induce? For the RNA-seq results, again, what were the number of up/down-regulated genes for each dose?  

      Addressed on pages: 6-7 and in new Supplementary files 1-3  – The experiment shown in Figure 1 was designed to 1) serve as proof of principle that cells maintained in culture could be susceptible to EDC-induced epimutagenesis at all, 2) determine if any response observed would be dose-dependent, and 3) identify a minimally effective dose of BPS to be used for the remaining experiments in this study (which we identified as 1 μM). We agree that it is interesting that the 50 µM dose of BPS induced predominantly hypermethylation changes whereas the 1 µM and 100 µM doses induced predominantly hypomethylation changes, but are not in a position to offer a mechanistic explanation for this outcome at this time. As the results shown satisfied our primary objectives of demonstrating that exposure of cells in culture to BPS could indeed induce DNA methylation epimutations, that this occurs in a dose-dependent manner, and that a dose of as low as 1 µM of BPS was sufficient to induce epimutagenesis, the data obtained satisfied all of the initial objectives of this experiment. That said, in response to the reviewer’s request we have now added text on pages 6-7 alluding to new Supplementary files 1-3 indicating the total number of DMCs and DMRs, as well as the number of DEGs, detected in response to exposure to each dose of BPS shown in Figure 1, as well as stratifying those results to indicate the numbers of hyper- and hypomethylation epimutations and up- and down-regulated DEGs induced in response to each dose of BPS. While, as noted above, investigating the mechanistic basis for the difference in responses induced by the 50 µM versus 1 and 100 µM doses of BPS was beyond the scope of the study presented in this manuscript, we do find this result reminiscent of the “U-shaped” response curves often observed in toxicology studies. Importantly, this result does demonstrate the elevated resolution and specificity of analysis facilitated by our in vitro cell culture model system.

      (4) Also in Figure 1, were there DMRs or genes in common across the doses? How did DMRs relate to gene expression results? This would be informative in verifying or refuting expectations that greater methylation is often associated with decreased gene expression.  

      Addressed on pages: 6-7 and new Supplementary files 1-6 – In general, we observed a coincidence between changes in DNA methylation and changes in gene expression (Supplementary files 1-3). Pertaining directly to the reviewer’s question about the extent to which we observed common DMRs and DEGs across all doses, while we only found 3 overlapping DMRs conserved across all doses tested, we did find an average of 51.25% overlap in DMCs and an average of 80.45% overlap in DEGs across iPSCs exposed to the different doses of BPS shown in Figure 1. In addition, within each dose of BPS tested in iPSCs, we also found that there was an overlap between DMCs and the promoters or gene bodies of many DEGs (Supplementary file 5). Specifically within gene promoters, we observed a correlation between hypermethylated DMCs and decreased gene expression and hypomethylated DMCs and increased gene expression, respectively (Supplementary file 6).

      (5) In Figure 2, was there an overlap in the hypo- and/or hyper-methylated DMCs? Please also add more description of the data in 2b to the legend including what the dot sizes/colors mean, etc. Some readers (including me) may not be familiar with this type of data presentation. Some of this comes up in Figure 4, so perhaps allude to this earlier on, or show these data earlier.  

      Addressed on pages: 8-9 and new Supplementary file 4 – We observed an average of 11.05% overlapping DMCs between different pairs of cell types, we did not observe any DMCs that were shared among all four cell types. Indeed, this limited overlap of DMCs among different cell types exposed to BPS was the primary motivation for the analysis described in Figure 2. Thus, instead of focusing solely on direct overlap between specific DMCs, we instead examined similarities among the different cell types tested in the occurrence of epimutations within different annotated genomic regions. To better describe this, we have now added additional text to page 9. We have also added more detail to the legend for Figure 2 on page 8 to more clearly explain the significance of the dot sizes and colors, explaining that the dot sizes are indicative of the relative number of differentially methylated probes that were detected within each specific annotated genomic region, and that the dot colors are indicative of the calculated enrichment score reflecting the relative abundance of epimutations occurring within a specific annotated genomic region. The relative score is calculated by iterating down the list of DMCs and increasing a running-sum statistic when encountering a DMC within the specific annotated genomic region of interest and decreasing the sum when the epimutation is not in that annotated region. The magnitude of the increment depends upon the relative occurrence of DMCs within a specific annotated genomic region.

      (6) iPSCs were derived from male mice MEFs, and subsequently used to differentiate into PGCLCs. The only cell type from an XX female is the granulosa cells. This might be important, and should be mentioned and its potential significance discussed (briefly).  

      Addressed on page: 29 – We have added a new paragraph just before the final paragraph of the Discussion section in which we acknowledge that most of the cell types analyzed during our study were XY-bearing “male” cells and that the manner in which XX-bearing “female” cells might respond to similar exposures could differ from the responses we observed in XY cells. However, we also noted that our assessment of XX-bearing granulosa cells yielded results very similar to those seen in XY Sertoli cells suggesting that, at least for differentiated somatic cell types, there does not appear to be a significant sex-specific difference in response to exposure to a similar dose of the same EDC. That said, we also acknowledged that in cell types in which dosage compensation based on X-chromosome inactivation is not in place, differences between XY- and XX-bearing cells could accrue.

      (7) EREs are only one type of hormone response element. The authors make the point that other mechanisms of BPS action are independent of canonical endocrine signaling. Would authors please briefly speculate on the possibility that other endocrine pathways including those utilizing AREs or other HREs may play a role? In other words, it may not be endocrine signaling independent. The statement that the differences between PGCLCs and other cells are largely due to the absence of ERs is overly simplistic.  

      Addressed on page: 11 and in a new Supplementary file 8  – Previous reports have indicated that BPS does not have the capacity to bind with the androgen receptor (Pelch et al., 2019; Yang et al., 2024). However there have been reports indicating that BPS can interact with other endocrine receptors including PPARγ and RXRα, which play a role in lipid accumulation and the potential to be linked to obesity phenotypes (Gao et al., 2020; Sharma et al., 2018). To address the reviewer’s comment we assessed the expression of a panel of hormone receptors including PPARγ, RXRα, and AR  in each of the cell types examined in our study and these results are now shown in a new Supplementary file 8. We show that in addition to not expressing either estrogen receptor (ERa or ERb), germ cells also do not express any of the other endocrine receptors we tested including AR, PPARγ, and RXRα. Thus we now note that these results support our suggestion that the induction of epimutations we observed in germ cells in response to exposure to BPS appears to reflect disruption of non-canonical endocrine signaling. We also note that non-canonical endocrine signaling is well established (Brenker et al., 2018; Ozgyin et al., 2015; Song et al., 2011; Thomas and Dong, 2006). Thus we feel the suggestion that the effects of BPS exposure could conceivably reflect either disruption of canonical or non-canonical signaling in any cell type is well justified and that our data suggests that both of these effects appear to have accrued in the cells examined in our study as suggested in the text of our manuscript.

      (8) Interpretation of data from the GO analysis is similarly overly simplistic. The pathways identified and discussed (e.g. PI3K/AKT and ubiquitin-like protease pathways) are involved in numerous functions, both endocrine and non-endocrine. Also, are the data shown in Figure 6a from all 4 cell types? I am confused by the heatmap in 6c, which genes were significantly affected by treatment in which cell types?  

      Addressed on pages: 19-21 – Per the reviewer’s request, we have added text to indicate that Figure 6a is indeed data from all four cell types examined. We have also modified the text to further clarify that Figure 6c displays the expression of other G-coupled protein receptors which are expressed at similar, if not higher, levels than either ER in all cell types examined, and that these have been shown to have the potential to bind to either 17β-estradiol or BPA in rat models. As alluded to by the reviewer, this is indicative of a wide variety of distinct pathways and/or functions that can potentially be impacted by exposure to an EDC such as BPS. Thus, we have attempted to acknowledge the reviewer’s primary point that BPS may interact with a variety of receptors or other factors involved with a wide variety of different pathways and functions. Importantly, this illustrates the strength of our model system in that it can be used to identify potential impacted target pathways that can then be subsequently pursued further as deemed appropriate.

      (9) In Figure 7, what were the 138 genes? Any commonalities among them? 

      Addressed on page: 22 and in a new Supplementary files 13 and 14 – We have now added a new supplemental Excel file (Supplementary file 13) that lists the 138 overlapping conserved DEGs that did not become reprogrammed/corrected during the transition from iPSCs to PGCLCs. In addition, we have added new text on page 22 and a new Supplementary file 14 which displays KEGG analysis of pathways associated with these 138 retained DEGs. We find that these genes are primarily involved with cell cycle and apoptosis pathways which, interestingly, have the potential to be linked to cancer development which is often linked to disruptions in chromatin architecture.

      (10) The Introduction is very long. The last paragraph, beginning line 105, is a long summary of results and interpretations that better fit in a Discussion section.

      Addressed on page: 6 – We have now significantly reduced the length and scope of the final paragraph of the Introduction per the reviewer’s recommendation.

      (11) Provide some details on husbandry: e.g. were they bred on-site? What food was given, and how was water treated? These questions are to get at efforts to minimize exposure to other chemicals.  

      Addressed on page: 37 – We have added additional text detailing that all mice used in the project were bred onsite, water was non-autoclaved conventional RO water, and our selection of 5V5R extruded feed for mice used in this study which was highly controlled for the presence of isoflavones and has been certified to be used for estrogen-sensitive animal protocols.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript uses cell lines representative of germ line cells, somatic cells, and pluripotent cells to address the question of how the endocrine-disrupting compound BPS affects these various cells with respect to gene expression and DNA methylation. They find a relationship between the presence of estrogen receptor gene expression and the number of DNA methylation and gene expression changes. Notably, PGCLCs do not express estrogen receptors and although they do have fewer changes, changes are nevertheless detected, suggesting a nonconical pathway for BPS-induced perturbations. Additionally, there was a significant increase in the occurrence of BPS-induced epimutations near EREs in somatic and pluripotent cell types compared to germ cells. Epimutations in the somatic and pluripotent cell types were predominantly in enhancer regions whereas that in the germ cell type was predominantly in gene promoters. 

      Strengths: 

      The strengths of the paper include the use of various cell types to address the sensitivity of the lineages to BPS as well as the observed relationship between the presence of estrogen receptors and changes in gene expression and DNA methylation. 

      Weaknesses: 

      The weaknesses include the lack of reporting of replicates, superficial bioinformatic analysis, and the fact that exposures are more complicated in a whole organism than in an isolated cell line. 

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors. 

      Reviewer #2 (Recommendations For The Authors): 

      Overall, this is an intriguing paper but more transparency in the replicates and methods and a more rigorous bioinformatic treatment of the data are required. 

      Specific comments: 

      (1) End of abstract "These results suggest a unique mechanism by which an EDC-induced epimutated state may be propagated transgenerationally following a single exposure to the causative EDC." This is overly speculative for an abstract. There is only epigenetic inheritance following mitosis or differentiation presented in this study. There is no meiosis and therefore no ability to assess multi- or transgenerational inheritance. 

      Addressed on page: 2 – We have modified the text at the end of the abstract to more precisely reflect our intended conclusions based on our data. In our view, the ability of induced epimutations to transcend meiosis per se is not as relevant to the mechanism of transgenerational inheritance as their ability to transcend major waves of epigenetic reprogramming that normally occur during development of the germ line. In this regard the transition from pluripotent iPSCs to germline PGCLCs has been shown to recapitulate at least the first portion of normal germline reprogramming, and now our data provide novel insight into the fate of induced epimutations during this process. Specifically, we show that a prevelance of epimutations was conserved during the iPSC à germ cell transition but that very few (< 5%) of the specific epimutations present in the the BPS-exposed iPSCs were retained when those cells were induced to form PGCLCs. Rather, we observed apparent correction of a large majority of the initially induced epimutations during this transition, but this was accompanied by the apparent de novo generation of novel epimutations in the PGCLCs. We suggest, based on other recent reports in the literature, that this is a result of the BPS exposure inducing changes in the chromatin architecture in the exposed iPSCs such that when the normal germline reprogramming mechanism is imposed on this disrupted chromatin template there is both correction of many existing epimutations and the genesis of many novel epimutations. This observation has the potential to explain the long-standing question of why the prevalence of epimutations persists across multiple generations despite the occurrence of epigenetic reprogramming during each generation. Nevertheless, as noted above, we have modified the text at the end of the abstract to temper this interpretation given that it is still somewhat speculative at this point.

      (2) Doses used in the experiments. One needs to be careful when stating that the dose used is "below FDA's suggested safe environmental level established for BPA" because a different bisphenol is being used here (BPA vs BPS) and the safe level is that which the entire organism experiences. It is likely that cell lines experience a higher effective dose.  

      Addressed on pages: 3, 5, and 26 – We have now made a point of noting that our reference to an EPA-recommended “safe dose” of BPA was for humans and/or intact animals. Changes to this effect have been made in the second and sixth paragraphs of the Introduction section. In addition, we have added text at the end of the fourth paragraph of the Discussion section acknowledging that, as the reviewer suggests, the same dose of an EDC could exert greater effects on cells in a homogeneous culture than on the same cell type within an intact animal given the potential for mitigating metabolic effects in the latter. However, we also note that the ability we demonstrated to quantify the effects of such exposures on the basis of numbers of epimutations (DMCs or DMRs) induced could potentially be used in future studies to study this question by assessing the effects of a specific dose of a specific EDC on a specific cell type when exposed either within a homogeneous culture or within an intact animal.

      (3) Figure 1: In the dose response, what was the overlap in DMCs and DEGs among the 3 doses? Are the responses additive, synergistic, or completely non-overlapping? This is an important point that should be addressed. 

      Addressed on page: 6-7 and in Supplementary files 1-5 – Please see our response to Reviewer 1 critique #4 above where we address similar concerns. While we do find overlap among different cell types with respect to the DMCs, DMRs, and DEGs displayed in Figure 1, we found the effect to be only partially additive as opposed to synergistic in any apparent manner. The fold increase in DMCs, DMRs, and DEGs resulting from exposure to doses of 1 μM or 50 μM ranged from 2.5x to 4.4x, which was well below the 50x increase that would have been expected from a strictly additive effect, and the effect increased even less, if at all, in response to exposure to doses of 50 μM versus 100 μM BPS. Finally, as now noted in the Discussion section on page 25, our conclusion is that these results display a limited dose-dependent effect that was partially additive but also plateaued at the highest doses tested.

      (4) Methods: How many times was each exposure performed on a given cell type? This information should be in the figure legends and methods. In the case of multiple exposures for a given line, do the biological replicates agree? 

      Addressed on pages: 39-45 and in new Supplementary file 15 –  Please see our response to Reviewer 1 critique #2 where we address similar concerns with newly added text and analysis. We now note repeatedly on pages 39-45 that each analysis was conducted on three replicate samples, and we display the similarity among those replicates graphically in a new Supplementary file 15.

      (5) DNA methylation analyses. Very little analysis is presented on the BeadChip array other than hypermethylated/hypomethylated and genomic regions of DMCs. What is the range of methylation changes? Does it vary between hypo vs. hyper DMCs? How many array experiments were performed (biological replicates) and what stats were used to determine the DMCs? Are there DMCs in common among the various cell types? As an example, if more meaningful analysis, one can plot the %5mC over a given array for comparisons between control and treated cell types. For more granularity, the %5mC can be presented according to the element type (enhancers vs promoters). 

      Addressed on pages: 10 and 39-45 and in new Supplementary files 1-5, 15 –  Please see our response to Reviewer 1 critique #2 above where we address similar concerns regarding the number of biological replicates used in this study. DMCs on the Infinium array are identified using mixed linear models. This general supervised learning framework identifies CpG loci at which differential methylation is associated with known control vs. treated co-variates. CpG probes on the array were defined as having differential changes that met both p-value and FDR (≤ 0.05) significant thresholds between treatment and control samples for each cell type analyzed. The range of medians across all samples was 0.0278 to 0.0059 for hypermethylated beta values and -0.0179 to -0.0033 for hypomethylated beta values. As noted above, we did observe an overlap in DMCs between cell types. Thus, we observed an average of 11.05% overlapping DMCs between two or more cell types but we did not observe any DMCs shared between all four cell types. We have added additional text on page 9 and new Supplementary files 1-5 to now more clearly describe that this limited similarity in direct overlap of DMCs was the underlying motivation for the analysis described in Figure 2. Finally, the enrichment dot plots shown in Figure 2 provide the information the reviewer requested regarding the %5mC observed at different annotated genomic element types.

      (6) The investigators correlate the number of DMCs in a given cell type with the presence of estrogen receptors. Does the correlation extend to the methylation difference (delta beta) at the statistically different probes?

      Addressed in a new Supplementary file 7 – We have added a new Supplementary file 7 in which we provide data addressing this question. In brief, we find that the delta betas of probes enriched at enhancer regions and associated with relative proximity to ERE elements in Sertoli cells, granulosa cells, and iPSCs appear very similar to those associated with DMCs not located within these enriched regions. However, when we compared the similarity of the two data sets with goodness of fit tests, we found these relatively small differences were, in fact, statistically significant based on a two-sample Kolmogorov-Smirnov test. These observed significant differences appear to indicate that there is higher variability among the delta betas associated with hypomethylated, but not hypermethylation changes occurring at DMCs associated with enhancers, potentially suggesting a greater tendency for exposure to BPS to induce hypomethylation rather than hypermethylation changes, at least in these specific regions.

      (7) Methylation changes relative to EREs are presented in multiple figures. Are other sequences enriched in the DMCs? 

      Addressed in a new Supplementary file 11. We profiled the genomic sequence within 500 bp of cell type-specific enriched DMCs that were either associated with enhancer regions in Sertoli, granulosa, or iPS cells or transcription factor binding sites in PGCLCs for the identification of higher abundance motif sequences. We then compared any motifs identified with the JASPAR database to potentially find transcription factors that could be binding to these regions. Interestingly we found that the two most common motifs across all cell types were associated with either the chromatin remodeling transcription factor HMG1A or the pluripotency factor KLF4.

      (8) Please present a correlation plot between the methylation differences and the adjacent DEGs. Again, the absence of consideration of the absolute changes in methylation and gene expression minimizes the impact of the data. 

      Addressed on pages 6, 7, and 17 and in a new Supplementary file 6 – We analyzed the relationship between DMCs at DEGs promoter regions and the corresponding change in expression of that DEG. Our data support a relationship between up-regulated genes showing decreased methylation in promoter regions and down-regulated genes showing increased methylation at promoter regions, although there were some exceptions to this relationship.

      (9) EM-Seq is mentioned in Figure 7 and in the material and methods. Where is it used in this study? 

      Addressed on page 22 – We now note in the text on page 22 that EM-seq was used during experiments assessing the propagation of BPS-induced epimutations during the iPSC à EpiLC à PGCLC cell state transitions to gather higher resolution data of changes to DNA methylation differences at the whole-epigenome level.

      References

      Brenker C, Rehfeld A, Schiffer C, Kierzek M, Kaupp UB, Skakkebæk NE, Strünker T. 2018. Synergistic activation of CatSper Ca2+ channels in human sperm by oviductal ligands and endocrine disrupting chemicals. Hum Reprod 33:1915–1923. doi:10.1093/humrep/dey275

      Gao P, Wang L, Yang N, Wen J, Zhao M, Su G, Zhang J, Weng D. 2020. Peroxisome proliferator-activated receptor gamma (PPARγ) activation and metabolism disturbance induced by bisphenol A and its replacement analog bisphenol S using in vitro macrophages and in vivo mouse models. Environ Int 134. doi:10.1016/J.ENVINT.2019.105328

      Ozgyin L, Erdos E, Bojcsuk D, Balint BL. 2015. Nuclear receptors in transgenerational epigenetic inheritance. Prog Biophys Mol Biol. doi:10.1016/j.pbiomolbio.2015.02.012

      Pelch KE, Li Y, Perera L, Thayer KA, Korach KS. 2019. Characterization of Estrogenic and Androgenic Activities for Bisphenol A-like Chemicals (BPs): In Vitro Estrogen and Androgen Receptors Transcriptional Activation, Gene Regulation, and Binding Profiles. Toxicol Sci 172:23–37. doi:10.1093/TOXSCI/KFZ173

      Sharma S, Ahmad S, Khan MF, Parvez S, Raisuddin S. 2018. In silico molecular interaction of bisphenol analogues with human nuclear receptors reveals their stronger affinity vs. classical bisphenol A. Toxicol Mech Methods 28:660–669. doi:10.1080/15376516.2018.1491663

      Song K-H, Lee K, Choi H-S. 2011. Endocrine Disrupter Bisphenol A Induces Orphan Nuclear Receptor Nur77 Gene Expression and Steroidogenesis in Mouse Testicular Leydig Cells. Endocrinology 143:2208–2215. doi:10.1210/endo.143.6.8847

      Thomas P, Dong J. 2006. Binding and activation of the seven-transmembrane estrogen receptor GPR30 by environmental estrogens: A potential novel mechanism of endocrine disruption. J Steroid Biochem Mol Biol 102:175–179. doi:10.1016/j.jsbmb.2006.09.017

      Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. 2024. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. Environ Sci Technol 58:2817–2829. doi:10.1021/ACS.EST.3C09779/ASSET/IMAGES/LARGE/ES3C09779_0004.JPEG

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Strengths:

      The authors have generated a novel transgenic mouse line to specifically label mature differentiated oligodendrocytes, which is very useful for tracing the final destiny of mature myelinating oligodendrocytes. Also, the authors carefully compared the distribution of three progenitor cre mouse lines and suggested that Gsh-cre also labeled dorsal OLs, contrary to the previous suggestion that it only marks LGE-derived OPCs. In addition, the author also analyzed the relative contributions of OLs derived from three distinct progenitor domains in other forebrain regions (e.g. Pir, ac). Finally, the new transgenic mouse lines and established multiple combinatorial genetic models will facilitate future investigations of the developmental origins of distinct OL populations and their functional and molecular heterogeneity.

      Weaknesses:

      Since OpalinP2A-Flpo-T2A-tTA2 only labels mature oligodendrocytes but not OPCs, the authors can not suggest that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation (line 118-9). It remains possible that LGE/CGE-derived OPCs migrate into the cortex but are later eliminated.

      We are glad that the reviewer appreciates our work and are grateful for the positive comments and the constructive suggestion. We agree with the reviewer that our methodology by itself cannot suggest whether the lack of LGE/CGE-derived-OLs in the neocortex is caused by competitive postnatal elimination or not. That is why we cited a parallel work by Li et al. (ref [17] in the original manuscript; ref [19] in the revised manuscript), in which in utero electroporation (IUE) failed to label LGE-derived OL lineage cells in both embryonic and early postnatal brains. Although they did not directly explore CGE using IUE, their fate mapping results using Emx1-Cre; Nkx2.1-Cre; H2B-GFP at P0 and P10 revealed very low percentage of LGE/CGE-derived OL lineage cells. The lack of adult labeling in our study together with the lack of developmental labeling in the other study prompted us to hypothesize that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation. In the revised manuscript, we have expanded the discussion to explain this point more clearly.

      Reviewer #2 (Public Review):

      [...] Strengths:

      The strength and novelty of the manuscript lies in the elegant tools generated and used and which have the potential to elegantly and accurately resolve the issue of the contribution of different progenitor zones to telencephalic regions.

      We are glad that the reviewer appreciates our work and are grateful for the overall positive comments.

      Weaknesses:

      (1) Throughout the manuscript (with one exception, lines 76-78), the authors quantified OL densities instead of contributions to the total OL population (as a % of ASPA for example). This means that the reader is left with only a rough estimation of the different contributions.

      We thank the reviewer for this constructive suggestion. We have replaced the density quantification (Figure 2F and 3D in the original manuscript) with contributions to the total OL population (% of ASPA) (Figure 2J and 2N in the revised manuscript).

      (2) All images and quantifications have been confined to one level of the cortex and the potential of the MGE and the LGE/CGE to produce oligodendrocytes for more anterior and more posterior cortical regions remains unexplored.

      The quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. We apologize for not having stated and presented this information clearly enough, and for the confusions it may have caused. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200*) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      (3) Hence, the statement that "In summary, our findings significantly revised the canonical model of forebrain OL origins (Figure 4A) and provided a new and more comprehensive view (Figure 4B )." (lines 111, 112) is not really accurate as the findings are neither new nor comprehensive. Published manuscripts have already shown that (a) cortical OLs are mostly generated from the cortex [Tripathi et al 2011 (https://doi.org/10.1523/JNEUROSCI.6474-10.2011), Winker et al 2018 (https://doi.org/10.1523/JNEUROSCI.3392-17.2018) and Li et al (https://doi.org/10.1101/2023.12.01.569674)] and (b) MGE-derived OLs persist in the cortex [Orduz et al 2019 (https://doi.org/10.1038/s41467-019-11904-4) and Li et al 2024 (https://doi.org/10.1101/2023.12.01.569674)]. Extending the current study to different rostro-caudal regions of the cortex would greatly improve the manuscript.

      As explained in the response to comment (2), our original quantifications included different rostro-caudal regions of the cortex. In the revised manuscript, we have added more schematics and representative images in the Supplementary Figure 2 for better illustration to resolve the concern of comprehensiveness.

      We thank the reviewer for listing and summarizing highly relevant published researches along with the parallel study by Li et al. submitted to eLife. We apologize for the omission of the first two references in our original manuscripts and have cited them in appropriate places (ref [10] and ref [11] in the revised manuscript). However, we believe these works do not compromise the novelty and significance of our work for the following reasons:

      (1) Tripathi et al. 2011 (ref [10] in the revised manuscript) analyzed OL lineage cells in the corpus callosum and the spinal cord, but not in the cortex and anterior commissure. Their analysis was performed in juvenile mice (P12/13), not in adulthood. Most importantly, their analysis of ventrally derived OL lineage cells relied on lineage tracing using Gsh2Cre, which in fact also label OLs derived from Gsh2+ dorsal progenitors. In contrast, we analyzed mature OLs in the cortex, corpus callosum and anterior commissure in 2-month-old adult mice. We used intersectional and subtractive strategy to label OLs derived from dorsal, LGE/CGE and MGE/POA origins. Our strategy differentiated the two different ventral lineages (LGE/CGE vs. MGE/POA) and avoided mixed labeling of OLs from ventral and dorsal Gsh2+ progenitors.

      (2) Winkler et al. 2018 (ref [11] in the revised manuscript) analyzed OLs derived from dorsal progenitors but only quantified those in the gray matter and the white matter of somatosensory cortex. Their quantification relied on co-staining with Olig2/Sox10, and thereby included both oligodendrocyte precursors (OPCs) and OLs. In contrast, we analyzed mature OLs from three origins and quantified not only neocortical regions (Mo and SS) but also an archicortical region (Pir). Our analysis revealed that although dorsally derived OLs dominate neocortex, ventrally derived OLs, especially the LGE/CGE-derived ones, dominate piriform cortex.

      (3) Orduz et al. 2019 (ref [7] in the original manuscript and the revised manuscript) mainly focused on POA-derived OLs in the somatosensory cortex. Although they performed limited analysis on MGE/POA-derived OPCs at postnatal day 10 and 19, no quantification of MGE/POA-derived OLs was performed in terms of their density, contribution to the total OL population and spatial distribution in the cortex. In contrast, we performed systematic quantification on these aspects to demonstrate that MGE/POA-derived OLs make small but sustained contribution to cortex with a distribution pattern distinctive from those derived from the dorsal origin.

      (4) Li et al. 2024 (ref [17] in the original manuscript and [19] in the revised manuscript) is a parallel study submitted to eLife. Their and our independent discoveries nicely complemented each other. Using different sets of techniques and experiments but some shared genetic mouse models, we both found that LGE/CGE made minimum contribution to neocortical OLs. Their analysis in the prenatal and early postnatal stages together with our analysis in the adult brain painted a more comprehensive picture of cortical oligodendrogenesis. The uniqueness of our work is that we performed systematic quantification of all three origins and uncovered the differential contributions to neocortex, piriform cortex, corpus callosum and anterior commissure.

      In summary, our work developed novel strategies to faithfully trace OLs from the three different origins and performed systematic analysis in the adult brain. Our data uncovered their differential contributions to neocortex, piriform cortex and the two commissural white matter tracts, which significantly differ not only from the canonical view but also from other previous studies in aspects discussed above. We believe our discoveries did significantly revise the canonical model of forebrain OL origins and provided a new and more comprehensive view.

      Reviewer #3 (Public Review):

      [...] Intriguingly, by using an indirect subtraction approach, they hypothesize that both Emx1-negative and Nkx2.1-negative cells represent the progenitors from lateral/caudal ganglionic eminences (LC), and conclude that neocortical OLs are not derived from the LC region.The authors claim that Gsh2 is not exclusive to progenitor cells in the LC region (PMID: 32234482). However, Gsh2 exhibits high enrichment in the LC during early embryonic development. The presence of a small population of Gsh2-positive cells in the late embryonic cortex could originate/migrate from Gsh2-positive cells in the LC at earlier stages (PMID: 32234482). Consequently, the possibility that cortical OLs derived from Gsh2+ progenitors in LC could not be conclusively ruled out. Notably, a population of OLs migrating from the ventral to the dorsal cortical region was detected after eliminating dorsal progenitor-derived OLs (PMID: 16436615).

      The indirect subtraction data for LC progenitors drawn from the OpalinFlp-tdTOM reporter in Emx1-negative and Nkx2.1-negative cells in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line present some caveats that could influence their conclusion. The extent of activity from the two Cre lines in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mice remains uncertain. The OpalinFlp-tdTOM expression could occur in the presence of either Emx1Cre or Nkx2.1Cre, raising questions about the contribution of the individual Cre lines. To clarify, the authors should compare the tdTOM expression from each individual Cre line, OpalinFlp::Emx1Cre::RC::FLTG or OpalinFlp::Nkx2.1Cre::RC::FLTG, with the combined OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line. This comparison is crucial as the results from the combined Cre lines could appear similar to only one Cre line active.

      Overall, the authors provided intriguing findings regarding the origin and fate of oligodendrocytes from different progenitor cells in embryonic brain regions. However, further analysis is necessary to substantiate their conclusion about the fate of LC-derived OLs convincingly.

      We thank the reviewer for these thoughtful comments. We agree with the reviewer that the presence of Gsh2-positive cells in the late embryonic cortex by itself could not rule out the possibility that they originate/migrate from Gsh2-positive cells in the LC at earlier stages. Staining dorsal-lineage intermediate progenitors with Gsh2, or performing intersectional lineage tracing using Gsh2Cre along with a dorsal-specific Flp driver, would provide more direct evidence on this issue. Nonetheless, as our lineage tracing of LGE/CGE-derive OLs did not employ Gsh2Cre, the doubt on the identity of Gsh2+ cortical progenitors should not affect the interpretation of our data.

      Regarding the subtractional LCOL labeling strategy used in our study, we wonder if there was any misunderstanding by the reviewer. As stated in our manuscript (line 59-61) and reiterated by the reviewer, OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG labels OLs derived from progenitors that express neither Emx1Cre nor Nkx2.1Cre. As these two progenitor pools do not overlap with each other, there is a purely additive effect of their actions. If there is any concern about efficiency and specificity, it would be non-adequate Cre-mediated recombinations that lead to mislabeling of dOLs or MPOLs as LCOLs (i.e., OLs derived from Emx1 or Nkx2.1-expressing progenitors were not successfully “subtracted” and thereby “wrongly” retained RFP expression). Therefore, the bona-fide LGE/CGE-derive OLs would only be fewer but not more than RFP+ LCOLs labeled by our subtractional strategy, even if any of the Cre lines did not work efficiently enough. In any case, this would not affect our conclusion that LGE/CGE-derive OLs make a minimal contribution to neocortex, as the “ground truth” contribution by LGE/CGE could only be less but not more than what we have observed using the current strategy.

      In support of our conclusion, a parallel study by Li et al. 2024 (ref [17] in the original manuscript; ref [19] in the revised manuscript) also provided independent experimental evidence that “any contribution of oligodendrocyte precursors to the developing cortex from the lateral ganglionic eminence is minimal in scope (quoted from its eLife assessment).” In addition, in their revision, they performed Gsh2 immunostaining in P0 Emx1Cre::HG-loxP mouse and found nearly all Gsh2+ cells in the cortical SVZ were derived from the Emx1+ lineage. We are glad that this additional piece of evidence further clarified the case, but still want to emphasize that the subtractional strategy we took was designed purposefully to avoid the potential uncertainty of Gsh2Cre and to more faithfully label LGE/CGE-derived OLs. Therefore, the validity of our conclusion about the fate of LC-derived OLs should be independent from the question on the identity of Gsh2+ cortical progenitors and stands well by itself.

      We hope that these explanations have adequately addressed the reviewer’s concerns. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In Figures 2C, 2D, 2E and 3D, the authors should provide counts of labelled cells as a % of ASPA+ cells. This will give an accurate picture of the contribution of the different progenitor regions to OLs.

      The graphs in Figure 2F are unnecessary since they are simply repeats of C-E but re-arranged.

      We thank the reviewer for the valuable suggestions. These two recommendations are sort of related, and thereby we made the following changes. We replaced the density quantification in Figure 2F and 3D with % of ASPA (Figure 2J and 2N in the revised manuscript) to give an accurate picture of the contribution of the different progenitor regions to OLs, as suggested by the reviewer. We still retained the density counts in Figure 2C-E (Figure 2G-I in the revised manuscript). Together with quantifications of rotral-caudal and larminar distributions presented in Supplementary Figure 2, these data demonstrated that OLs from differential origins display distinct spatial distribution patterns.

      At what ages were the quantifications performed in all the figures?

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section of the revised manuscript.

      In 2D, and 3B the GFP should have been activated but the authors do not show it or quantify it presumably because GFP would flood the sections in the presence of Emx1Cre. Nevertheless, since eGFP is shown in the diagram in 2B, the authors should mention why they chose not to show it.

      We thank the reviewer for the helpful comment and the suggestion. We have modified the schematic in Figure 2B and added explanation in the figure legend (line 308-313). We also added a schematic in Supplementary Figure 1A along with images of GFP channel in Supplementary Figure 1D (line 338-350).

      All the main figures and supplementary figures are too small to see properly.

      We are sorry that there was severe compression of images in the combined manuscript file at the conversion step during the initial submission. We apologize for the compromised image quality and have re-uploaded full-size figures as individual files on BioRxiv soon after receiving the reviews. For the revised manuscript, we also take care to upload full-size figures at high resolution as individual files to ensure their quality of presentation.

      Supplementary Figure 2E is unnecessary and perhaps misleading the reader that cortical-derived OLs have a preference for the lower layers whereas the distribution may simply reflect the distribution of OLs in the cortex.

      We thank the reviewer for the helpful comment and the suggestion. We have removed this panel and replaced it with quantifications of relative laminar distributions of the total (ASPA+) OLs along with those from the three different origins (Supplementary Figure 2G in the revised manuscript). Indeed, the preference for the lower layers of dorsally-derived OLs mirrored the distribution of total OLs in the cortex, while the MGE/POA-derived OLs deviate significantly from others and exhibit higher preference towards layer 4.

      Quantification of labelled cells as a % of ASPA should also be performed in Supplementary Figure 3.

      We thank the reviewer for this suggestion. In the revised manuscript, we have included quantifications of labelled cells as % of ASPA for both OpalinFlp::Emx1Cre::Ai65 and  OpalinFlp::Nkx2.1Cre::Ai65 (Figure 2J and N). The sum of the these two data sets will be equivalent to those of OpalinFlp::Emx1Cre::Nkx2.1Cre::Ai65 shown in Supplementary Figure 3, and thereby we did not perform additional quantifications to avoid redundant efforts.

      Imaging and quantification should be extended to more posterior regions of the cortex to find out whether the contribution is different from the areas already examined.

      We thank the reviewer for the suggestion on imaging and apologize for the confusion about the range of quantification. As explained in the response to comment (2) of weakness, the quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors should provide Opalin reporter expression data across various brain regions at different developmental stages to clarify the expression pattern of the reporter.

      We appreciate the reviewer’s comment. We chose to performed all quantifications in adult mice as Opalin is a well-established marker for differentiated OLs and the recombinase-dependent reporter expression is accumulative and irreversible. If there is any non-specific labeling in any earlier developmental stage, it would be retained and manifested at the timepoint we examined as well. In another word, the fact that we did not detect any non-specific labeling in the current dataset but only confined labeling in mature OLs ensured that no non-OL labeling was present in earlier timepoint. As shown in Figure 1D-F, reporter expression activated by the Opalin driver is presented at high OL specificity in all analyzed brain regions. This is further corroborated by results from combinatorically labeled samples (Figure 2 and Supplementary Figure 2), in which only OLs but not any other cell types were labeled in all analyzed brain regions too. Following the reviewers’ suggestions, we have added representative images of more rostral and more caudal cortical regions (Supplementary Figure 2B-D), which also showed highly specific OL labeling.  

      (2) In Figure 1D, please specify the developmental stage of the mice used for staining.

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript.

      (3) The authors should clarify if the Opalin reporter expressed in OPCs and astrocytes at developmental stages of mice, such as P0, P7, and P30.

      We appreciate the reviewer’s comment, but as explained in response to comment (1), Opalin is a well-established marker for differentiated OLs which is not expressed in OPCs or astrocytes. As shown in Figure 1D-E, reporter expression is confined to CC1+ differentiated OLs with no colocalization with Sox9 (astrocyte marker). In support with this observation, only ASPA+ differentiated OLs but no OPC or astrocyte were labeled in any of the combinatorial lineage tracing samples generated using this line combined with progenitor-Cre lines. In addition to marker staining, we also did not observe any RFP+ cells with OPC or astrocyte morphology. As the recombinase-dependent reporter expression is accumulative and irreversible, the fact no non-specific labeling was observed in adult brain retrospectively proved the specificity of Oplain-Flp in earlier developmental stages.

      (4) In Figure 1E, authors should address why the efficiency of the tdTomato line is notably lower compared to that of H2B-GFP and whether the stability of reporters could impact the conclusions drawn.

      The difference in reporting efficiency is mainly caused by differences inherent to the two reporting systems. The TRE-RFP reporter is derived from Ai62, composed of a Tet response element and tdTomato inserted into the T1 TIGRE locus. The tdTomato expression is driven by tTA-TRE transcriptional activation. The HG-loxP reporter is derived from HG-Dual, composed of a CAG promoter, a frt-flanked STOP cassette, and H2B-GFP inserted into the Rosa26 locus. The H2B-GFP expression is driven by CAG promoter after Flp-mediated removal of the STOP cassette. A Flp-dependent tdTomato reporter designed in the same way as the HG-FRT reporter would have similar efficiency. In fact, the RC::FLTG reporter can be viewed as such a reporter in the absence of Cre, which did show similarly high efficiency as HG-FRT and supported efficient subtractive labeling of LGE/CGE-derived OLs. We apologize for a typo in the title of the Y-axis of the right panel in the original Figure 1F which may have caused potential misunderstanding. The “RFP+CC1+/CC1” should be “XFP+CC1/CC1”. We have corrected this mistake and revised the figure legend for clearer description of the data (Line 293-302 in the revised manuscript).

      (5) In Figure 2, please clarify the developmental stage of the mice used for staining. Authors should present the eGFP image in addition to tdTOM.

      We apologize for the omission of the age information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript. We thank the reviewer for the suggestion on eGFP image and have presented it in supplementary Figure 1 in the revised manuscript.

      (6) in Figure 2D, authors should display the eGFP image alongside the tdTomato image. It is difficult to assess the efficiency of Emx-Cre and Nkx2.1-Cre.

      We thank the reviewer for the suggestion on eGFP image and have presented eGFP image in Supplementary Figure 1D in the revised manuscript. There are two reasons why we chose to present it in the supplementary figure instead of main figure. First, we added ASPA staining in the green channel along with quantifications of RFP cells as % of ASPA in Figure 2 in the revised manuscript, following reviewer #2’s suggestion. Second, as pointed out by reviewer #2, GFP would flood the sections in the presence of Emx1Cre and could be quite distractive if it was shown together with RFP.

      We were not entirely sure what exactly the reviewer means by “assess the efficiency of Emx-Cre and Nkx2.1-Cre”, but we believe that the quantifications of RFP cells as % of ASPA clarified the contribution of each origin to the total OLs (Figure 2J and 2N in the revised manuscript).

      (7) Figure 3 depicts the entire brain, replicating the image presented in Figure 2. It would be beneficial to consolidate Figures 2 and 3, as they showcase identical brain scans of different regions.

      We thank the reviewer for the constructive suggestion and have consolidated Figures 2 and 3 in the original manuscript into Figure 2 in the revised manuscript.